
DOCKET N(95^US 00001^^<' 
CLIENT NO.->tolsft6sl60063 

IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 

In re application of: : Aninda Dasgupta 



PATENT 



Serial No. 

Filed 

For 

Group No. 
Examiner 



09/691,334 
October 18, 2000 

SYSTEM AND METHOD FOR DISPLAYING INFORMATION 
ON THE SCREEN OF A USER INTERFACE DEVICE UNDER 
THE CONTROL OF A DIGITAL AUDIO PLAYBACK DEVICE 

2126 

LeChi Truong 



MAIL STOP APPEAL BRIEF - PATENTS 

Commissioner for Patents 
P.O. Box 1450 
Alexandria, VA 22313-1450 

Sir: 



CERTIFICATE OF MAILING BY FIRST CLASS MAIL 

The undersigned hereby certifies that the following documents: 

1. Appeal Brief; 

2. Fee Transmittal for FY 2005 (in duplicate); 

3. Check in the amount of $500.00 for filing fee; and 

4. Postcard receipts (2) 

relating to the above application, were deposited as "First Class Mail" with the United States Postal Service, 
addressed to: MAIL STOP APPEAL BRIEF - PATENTS, Commissioner for Patents, P.O. Box 145o[ 
Alexandria, VA 223 13-1450, on February '22, 2005. 



Date: 




Mailer 



Date: /-jl .ZljWS^ 



P.O. Drawer 800889 

Dallas, Texas 75380 

Phone: (972) 628-3600 

Fax: (972) 628-3616 

E-mail: wmunck@davismunck.com 



William A. Munck 
Reg. No. 39,308 




Under the Pacy 



PTO/SB/17 (10-04V2) 

ffi Approved for use through 07/31/2006. 0MB 0651-0032 

^ U.S. Patent and Trademark Office; U.S. DEPARTMENT OF COMMERCE 

igi;<vork Re duction Act of 1995. no person s are requir ed to respond to_a collection of information unless it displays a valid QMS control numbe r. 



TRANSMITTAL 
for FY 2005 

Effective 1010112004. Patent fees are subject to annual revision. 



I I Applicant claims small entity status. See 37 CFR 1.27 



TOTAL AMOUNT OF PAYMENT 



($) 500.00 



Complete if Known 



^ 



Application Number 



Filing Date 



First Named Inventor 



Examiner Name 



Art Unit 



Attorney Docket No. 



09/691.334 



October 18, 2000 



Aninda Dasgupta 



LeChi Truong 



2126 



US 000013 (PHIL06-00063) 



METHOD OF PAYMENT (check all that apply) 



FEE CALCULATION (continued) 



[✓] Check Q Credit card Q Monkey Q 
W\ Deposit Account: 



Other I [None 



ADDITIONAL FEES 

I Small Entity 



Deposit 
Account 
Number 
Deposit 
Account 
Name 



50-0208 



Davis Munck, P.C. 



The Director is authorized to: (check all that apply) 

I I Charge fee(s) indicated below [✓] Credit any overpayments 

0 Charge any additional fee(s) or any underpayment of fee(s) 

Charge fee(s) indicated below, except for the filing fee 
to the above-identified deposit account. 



FEE CALCULATION 



1. BASIC FILING FEE 

Large Entity Small Entity 



Fee 


Fee 


Fee 


Fee 


Fee DescrlDtion 


Code ($) 


Code ($) 




1001 


790 


2001 


395 


Utility filing fee 


1002 


350 


2002 


175 


Design filing fee 


1003 


550 


2003 


275 


Plant filing fee 


1004 


790 


2004 


395 


Reissue filing fee 


1005 


160 


2005 


80 


Provisional filing fee 



Fee Paid 



SUBTOTALS) ($) -Q. 



2. EXTRA CLAIM FEES FOR UTILITY AND REISSUE 

Fee from 

Ext ra Claim s below Fee Paid 



Total Claims 



Independent 
Claims 

Multiple Dependent 



D -20** 

-3^ 



\ZD -3"=IZZIx[ 



Larae Entity 


Sm?ll Entity 


Fee Fee 
Code ($) 


Fee Fee 
Code ($) 


1202 18 


2202 9 


1201 88 


2201 44 


1203 300 


2203 150 


1204 88 


2204 44 


1205 18 


2205 9 



[ZZ]=C 



Fee Description 

Claims in excess of 20 

Independent claims in excess of 3 

Multiple dependent claim, if not paid 

*' Reissue independent claims 
over original patent 

** Reissue claims in excess of 20 
and over original patent 



Fee 
Code 


Fee 
($) 


Fee 
Code 


Fee 
($) 


Fee Description 


1051 


130 


2051 


65 


Surcharge - late filing fee or oath 


1052 


50 


2052 


25 


Surcharge - late provisional filing fee or 
cover sheet 


1053 


130 


1053 


130 


Non-English specification 


lO ic. 




1812 2,520 For filing a request for ex parte reexaminatio 


1804 


920* 


1804 


920* 


Requesting publication of SIR prior to 

CAoinint^r dULiUll 


1805 


1,840* 


1805 1,840* 


Requesting publication of SIR after 

C^AOIIIIIICI dUtlVJI 1 


1251 


110 


2251 


55 


Extension for reply within first month 


1252 


430 


2252 


215 


Extension for reply within second month 


1253 


980 


2253 


490 


Extension for reply within third month 


1254 


1,530 


2254 


765 


Extension for reply within fourth month 


1255 


2,080 


2255 


1,040 


Extension for reply within fifth month 




'iAf\ 
ohU 


2401 


170 


Notice of Appeal 


1402 


340 


2402 


170 


Filing a brief in support of an appeal 


1403 


300 


2403 


150 


Request for oral hearing 


1451 


1.510 


1451 


1,510 


Petition to institute a public use proceeding 


1452 


110 


2452 


55 


Petition to revive - unavoidable 


1453 


1,370 


2453 


685 


Petition to revive - unintentional 


1501 


1,370 


2501 


685 


Utility issue fee (or reissue) 


1502 


490 


2502 


245 


Design issue fee 


1503 


660 


2503 


330 


Plant issue fee 


1460 


130 


1460 


130 


Petitions to the Commissioner 


1807 


50 


1807 


50 


Processing fee under 37 CFR 1.1 7(q) 


1806 


180 


1806 


180 


Submission of Infomriation Disclosure Stmt 


8021 


40 


8021 


40 


Recording each patent assignment per 
property (times number of properties) 


1809 


790 


2809 


395 


Filing a submission after final rejection 
(37 CFR 1.129(a)) 


1810 


790 


2810 


395 


For each additional invention to be 
examined (37 CFR 1.129(b)) 


1801 


790 


2801 


395 


Request for Continued Examination (RCE) 


1802 


900 


1802 


900 


Request for expedited examination 
of a design application 



SUBTOTAL (2) 



($)-o- 



^^oniumber^revio^ For Reissues, see above 



Other fee (specify) 

*Reduced by Basic Filing Fee Paid 



SUBTOTAL (3) 



F££_Bmd 



5Q0.QQ 



($) 500.00 



SUBMITTED BY 



(Complete (if applicable)) 



Name (Print/Type) 



Signature 



V^llj^m A. Mum 



I Registration No. I oq onft 



Telephone 972-628-3600 



Date 



YjArninG: Information on this form may become public. Credit card Information should not 
be included on this form. Provide credit card Information and authorization on PTO-2038. 

This collection of information is required by 37 CFR 1.17 and 1.27. The information is required to obtain or retain a benefit by the public which is to file (and by the 
USPTO to process) an application. Confidentiality is governed by 35 U.S.C. 122 and 37 CFR 1.14. This collection is estimated to take 12 minutes to complete, 
including gathering, preparing, and submitting the completed application form to the USPTO. Time will vary depending upon the individual case. Any comments on 
the amount of time you require to complete this form and/or suggestions for reducing this burden, should be sent to the Chief Information Officer, U.S. Patent and 
Trademark Office, U.S. Department of Commerce, P.O. Box 1450, Alexandria, VA 22313-1450. DO NOT SEND FEES OR COMPLETED FORMS TO THIS ADDRESS. 
SEND TO: Commissioner for Patents, P.O. Box 1450, Alexandria, VA 22313-1450. 

If you need assistance in completing the form, call 1-800-PTO-9199 and select option 2. 



CLIENT NO. PHIL06-00063 



IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 



In re application of: 
Serial No.: 
Filed: 
For: 



Group No.: 
Examiner: 



Aninda Dasgupta 
09/691,334 
October 18, 2000 

SYSTEM AND METHOD FOR DISPLAYING 
INFORMATION ON THE SCREEN OF A USER 
INTERFACE DEVICE UNDER THE CONTROL OF A 
DIGITAL AUDIO PLAYBACK DEVICE 

2126 

LeChi Truong 



MAIL STOP APPEAL BRIEF - PATENTS 
Commissioner for Patents 
P.O. Box 1450 
Alexandria, VA 22313-1450 

Sir: 

APPEAL BRIEF 

The Appellant has appealed to the Board of Patent Appeals and Interferences from the 

decision of the Examiner dated September 21, 2004, finally rejecting Claims 1-24. The Appellant 

filed a Notice of Appeal on December 21, 2004. The Appellant respectfully submits this brief on 

appeal with the statutory fee of $500.00. 
02/28/2005 MflHREDl 00000013 09691334 
01 FC:1402 500.00 OP 



Docket No. US 000013 
Serial No. 09/691,334 
Patent 

REAL PARTY IN INTEREST 

This application is currently owned by Philips Electronics North America Corporation as 
indicated by an assignment recorded on October 1 8, 2000 in the Assignment Records of the United 
States Patent and Trademark Office at Reel 01 1227, Frame 0976. 

RELATED APPEALS AND INTERFERENCES 

There are no known appeals or interferences that will directly affect, be directly affected by, 
or have a bearing on the Board's decision in this pending appeal. 

STATUS OF CLAIMS 

Claims 1 -24 have been rej ected pursuant to a final Office Action dated September 2 1 , 2004. 
Claims 1-24 are presented for appeal. A copy of Claims 1-24 is provided in Appendix A. 

STATUS OF AMENDMENTS 

No amendments were submitted and refiised entry after issuance of the final Office Action 
dated September 21, 2004. 

SUMMARY OF CLAIMED SUBJECT MATTER 

Regarding Claim 1, a digital audio playback device (DAPD) 150 includes an interface 315 
capable of being coupled to a processing system 105. {Application, Page 19, Lines 15-18; Page 6, 
Lines 11-15). The processing system 105 is capable of executing a user interface appUcation 
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program 250 that accesses and controls the digital audio playback device 150. (Application, Page 13, 
Lines 15-1 7; Page 18, Lines 13-14). The digital audio playback device 1 50 also includes a memory 
330 capable of storing a reverse DAPD application programming interface (API) 360. (Application, 
Page 19, Lines 20-23). The digital audio playback device 150 further includes a processor 305 
capable of executing the reverse DAPD API 360. (Application, Page 20, Lines 2-11). The reverse 
DAPD API 360 is capable of causing the processor 305 to access and control a user interface, which 
is associated with the user interface application program 250 and is displayed on a monitor screen 
115 associated with the processing system 105. (Application, Page 20, Lines 11-14; Page 13, Lines 
15-18). 

Regarding Claim 7, a processing system 105 includes an interface 220 capable of being 
coupled to a digital audio playback device (DAPD) 150. (Application, Page 18, Lines 13-14; Page 8, 
Lines 4-7). The digital audio playback device 150 is capable of playing audio files stored in the 
digital audio playback device 150. (Application, Page 20, Lines 2-5; Page 8, Lines 8-9). The 
processing system 105 also includes a memory 230 capable of storing a user interface appUcation 
program 250 that accesses and controls the digital audio playback device 150. (Application, Page 1 7, 
Lines 15-18; Page 12, Lines 20-22). The memory 230 is also capable of storing a reverse DAPD 
application programming interface (API) 260. (Application, Page 1 7, Lines 15-18). The processing 
system 105 further includes a processor 205 capable of executing the user interface application 
program 250 and the reverse DAPD API 260. (Application, Page 17, Line 21 - Page 18, Line 3; 
Page 19, Lines 1-2). The reverse DAPD API 260 is capable of communicating with the digital audio 
playback device 1 50 and enabling the digital audio playback device 1 50 to access and control a user 
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interface, which is associated with the user interface apphcation program 250 and is displayed on a 
monitor screen 115 associated with the processing system 105. {Application, Page 20, Lines 11-14; 
Page 13, Lines 15-18), 

Regarding Claim 13, a method 400 of displaying information on a monitor screen 1 15 of a 
processing system 105 is provided. (Application, Page 20, Line 21 - Page 21, Line 18), The method 
400 includes executing in the processing system 105 a user interface application program 250 that 
accesses and controls a digital audio playback device (DAPD) 150. (Application, Page 20, Line 23 - 
Page 21, Line 2). The method 400 also includes executing a reverse DAPD application 
programming interface (API) 260. (Application, Page 21, Lines 2-6). The reverse DAPD API 260 
enables the digital audio playback device 150 to access and control a user interface, which is 
associated with the user interface application program 250 and is displayed on the monitor screen 
115. (Application, Page 21, Lines 6-10; Page 20, Lines 11-14; Page 13, Lines 15-18), 

Regarding Claim 20, computer-executable instructions are stored on a removable storage 
medium 180 readable by a processing system 105. (Application, Page 14, Lines 14-20). The 
computer-executable instructions comprise a method of displaying information on a monitor screen 
115 of the processing system 105. (Application, Page 20, Line 21 - Page 21, Line 18), The method 
includes executing in the processing system 105 a user interface application program 250 that 
accesses and controls a digital audio playback device (DAPD) 150. (Application, Page 20, Line 23 - 
Page 21, Line 2). The method also includes executing a reverse DAPD application programming 
interface (API) 260. (Application, Page 21, Lines 2-6), Executing the reverse DAPD API 260 
enables the digital audio playback device 150 to access and control a user interface, which is 
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associated with the user interface application program 250 and is displayed on the monitor screen 
115. {Application, Page 21, Lines 6-10; Page 20, Lines 11-14; Page 13, Lines 15-18). 

GROUNDS OF REJECTION 

1. Claims 1-19 stand rejected under 35 U.S.C. § 103(a) as being unpatentable over 
Admitted Prior Art {^'APA'') in view of U.S. Patent No. 6,292,1 87 to Gibbs et al. CGibbs'') and Bahl, 
"Software-only Compression, Rendering, and Playback of Digital Video" ^Bahr), 

2. Claims 20-24 stand rejected under 35 U.S.C. § 103(a) as being vmpatentable over 
APA, Gibbs, and Bahl in fiirther view of U.S. Patent No. 5,751,962 to Fanshier et al. CFanshier''). 

ARGUMENT 

I. GROUND OF REJECTION #1 

The rejection of Claims 1-19 under 35 U.S.C. § 103(a) is improper and should be withdrawn. 

A, OVERVIEW 

Claims 1-19 stand rejected under 35 U.S.C. § 103(a) as being unpatentable over Admitted 
Prior Art CAPA'') in view of U.S. Patent No. 6,292,187 to Gibbs et al. CGibbs'') and Bahl, 
"Software-only Compression, Rendering, and Playback of Digital Video" CBahr), 

For convenience, the Appellant has included a copy of the Bahl reference in Appendix B. All 
citations to the Bahl reference by the Appellant are based on this copy of the Bahl reference. 
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B. STANDARD 

In ex parte examination of patent applications, the Patent Office bears the burden of 
establishing a prima facie case of obviousness. (MPEP § 2142; In re Fritch, 972 F.2d 1260, 1262, 
23 U,S.P.Q,2dl780, 1783 (Fed. Cir, 1992)), The initial burden of establishing apnwa/ac/e basis to 
deny patentability to a claimed invention is always upon the Patent Office. {MPEP § 2142; In re 
Oetiker, 977 F2d 1443, 1445, 24 U.S.P.QJd 1443, ] 444 (Fed, Cir. 1992); InrePiasecki, 745 F.2d 
1468, 1472, 223 U.S.P.Q. 785, 788 (Fed. Cir. 1984)), Only when a/?n>wfl/ac/e case of obviousness 
is established does the burden shift to the Appellant to produce evidence of nonobviousness. {MPEP 
§ 2142; In re Oetiker, 977 F. 2d 1443, 1445, 24 U.S.P.Q.2d 1443, 1444 (Fed Cir, 1992); In re 
Rijckaert, 9F.3dl531, 1532, 28 U.S.P.Q,2d 1955, 1956 (Fed Cir. 1993)), If the Patent Office does 
not produce a prima facie case of unpatentability, then without more the Appellant is entitled to grant 
of apatent. {In re Oetiker, 977 F. 2d 1443, 1445, 24 U.S.P.Q.2d 1443, 1444 (Fed Cir. 1992); In re 
Grabiak, 769 F. 2d 729, 733, 226 U.S.P.Q. 870, 873 (Fed, Cir, 1985)), 

A prima facie case of obviousness is established when the teachings of the prior art itself 
suggest the claimed subject matter to a person of ordinary skill in the art. {In re Bell, 991 F.2d 781, 
783, 26 U.S.P.Q.2d 1529, 1531 (Fed. Cir. 1993)). To estabUsh a jc^n/wa facie case of obviousness, 
three basic criteria must be met. First, there must be some suggestion or motivation, either in the 
references themselves or in the knowledge generally available to one of ordinary skill in the art, to 
modify the reference or to combine reference teachings. Second, there must be a reasonable 
expectation of success. Finally, the prior art reference (or references when combined) must teach or 
suggest all the claim limitations. The teaching or suggestion to make the claimed invention and the 
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reasonable expectation of success must both be found in the prior art, and not based on the 
Appellant's disclosure. {MPEP § 2142). 

C. THE^/14 REFERENCE 

APA recites a system that includes a personal computer (PC) and a digital audio playback 
device (DAPD). {Application, Page 2, Lines 4-11). The digital audio playback device may include a 
user interface on the device itself {Application, Page 2, Lines 6-11). The PC may also include a user 
interface. {Application, Page 2, Lines 6-11), The user interface on the digital audio playback device 
and the user interface on the PC both allows the user to control the digital audio playback device. 
{Application, Page 2, Lines 4-6). The user interface on the PC uses an API on the digital audio 
playback device to control the digital audio playback device. {Application, Page 2, Lines 4-11; Page 
4, Lines 8-17). 

D. THE GIBBS REFERENCE 

Gibbs recites a method and system for modifying an interface of a "broadcast application" on 
a device, such as an interface on a digital television receiver. {Abstract). The interface on the device 
is controlled by a set of "mattes." {Abstract). The mattes control the ways in which components in 
the interface are displayed. {Abstract). 

E. THE BAHL REFERENCE 

Bahl recites various video coders and decoders. {Page 57; Figures 2-3). A software video 

-7- 
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library is accessible through an API and is used in conjunction with a video decoder. (Page 61, 
Architecture'' section; Page 63, Figure 9). A video player implementing the decoder provides 
various control functions, such as fast forward and reverse. (Page 63, Right column; Page 71, Right 
column). 



F. CLAIMS 1-19 

Claim 1 recites a digital audio playback device (DAPD), which includes: 

an external interface capable of being coupled to a connected 
processing system, said connected processing system capable of 
executing a user interface application program that accesses and 
controls said digital audio playback device via said external interface; 

a memory coupled to said external interface capable of storing 
a reverse DAPD application programming interface (API); and 

a processor coupled to said memory and said external 
interface and capable of executing said reverse DAPD API, said 
reverse DAPD API capable of causing said processor to access and 
control a user interface associated with said user interface appUcation 
program and displayed on a monitor screen associated with said 
connected processing system. 

The Examiner fails to establish that the proposed APA-Gibbs-Bahl combination discloses, 
teaches, or suggests all elements of Claim 1. In particular, the Examiner fails to establish that the 
proposed APA'Gibbs-Bahl combination discloses, teaches, or suggests bi-directional control between 
a "processing system" and a "digital audio playback device" as recited in Claim 1. 

More specifically. Claim 1 recites a "user interface application program" that is associated 
with a "user interface." Claim 1 also recites a "reverse DAPD application programming interface 
(API)." The user interface application program is executed by a "processing system," and the reverse 
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DAPD API is executed by a processor in a "digital audio playback device." In addition, the user 
interface application program executed by the processing system "accesses and controls" the digital 
audio playback device, and the reverse DAPD API causes the processor in the digital audio playback 
device to "access and control" the user interface. 

While Claim 1 recites bi-directional access and control between a processing system and a 
digital audio playback device, the references cited by the Examiner at most disclose, teach, or 
suggest uni-directional control of an end user device. 

First, APA clearly recites that the PC controls the digital audio playback device. (Application, 
Page 2, Lines 4-6), APA lacks any mention of allowing the digital audio playback device to control a 
user interface on the PC. As a result, APA only recites uni-directional control of an end user device 
(the digital audio playback device) by a PC. 

Second, Gibbs recites a system of uni-directional control of an end user device (such as a 
digital television receiver) by a network. Gibbs lacks any mention of bi-directional control between 
the network and the end user device. Gibbs simply allows a system to control the appearance of an 
interface on the end user device. At most, Gibbs would allow the PC of APA to control an interface 
on an end user device (the digital audio playback device). However, this combination of APA and 
Gibbs fails to disclose, teach, or suggest all elements of Claim 1. In particular, the APA-Gibbs 
combination fails to disclose, teach, or suggest bi-directional control between a "digital audio 
playback device" and a "processing system." 

Moreover, the interface in the end user device of Gibbs is used to control the end user device 
itself. Gibbs lacks any mention of a device that controls an interface in a separate system, where the 
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interface in the separate system controls the device. As a result, the APA-Gibbs combination fails to 
disclose, teach, or suggest a "reverse DAPD API" in a device that allows a processor in the device to 
"access and control" a "user interface" associated with a "user interface application program" 
executed by a "processing system," where the user interface application program "accesses and 
controls" the device. 

Third, Bahl lacks any mention of bi-directional control between a digital audio playback 
device and a processing system. Bahl simply recites that a software video library is accessible 
through an API and is used with a video decoder. Bahl also recites that a video player implementing 
the decoder provides control functions such as fast forward and reverse. In other words, Bahl simply 
recites a system that may retrieve video information from a library using an API. Separately, a user 
interface is used to control the playback of the video information. Bahl lacks any mention of bi- 
directional control between the video library and an end user device (the video player). As a result, 
the proposed APA-Gibbs-Bahl combination fails to disclose, teach, or suggest bi-directional control 
between a "processing system" and a "digital audio playback device" as recited in Claim 1. 

The Examiner asserts that bi-directional control is not recited in Claim 1 . {12/23/04 Advisory 

Action, Page 2, Second paragraph). While the term "bi-directional" is not recited in Claim 1 , Claim 

1 clearly recites bi-directional access and control between a processing system and a digital audio 

playback device. For example, Claim 1 clearly recites a "user interface application program" that is 

executed by a "processing system" and that "accesses and controls" a "digital audio playback 

device." Claim 1 also clearly recites that a "reverse DAPD application programming interface" 

allows a processor in the "digital audio playback device" to "access and control" a "user interface" 

-10- 
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associated with the "user interface appHcation program" (which is executed by the "processing 
system"). Based on this, Claim 1 clearly recites bi-directional access and control between a 
processing system and a digital audio playback device. 

The Examiner also asserts that the "user interface application program" recited in Claim 1 
"did not clearly belong to any specific device such as the PC or external device or digital audio 
playback device." (12/23/04 Advisory Action, Page 2, Second paragraph). However, Claim 1 clearly 
recites that the "processing system [is] capable of executing a user interface application program." 
As a result, the Examiner's interpretation of Claim 1 is improper. 

For these reasons, the proposed APA-Gibbs-Bahl combination fails to disclose, teach, or 
suggest all elements of Claim 1 . As a result, the Examiner fails to estabUsh a prima facie case of 
obviousness against Claim 1 (and its dependent claims). 

Claims 7 and 13 both recite a "user interface application program" in a "processing system" 
that controls a "digital audio playback device." Claims 7 and 13 also recite a "reverse DAPD 
application programming interface (API)" that enables the digital audio playback device to control a 
"user interface" associated with the user interface application program. 

As described above, the proposed APA-Gibbs-Bahl combination fails to disclose, teach, or 
suggest bi-directional control between a "processing system" and a "digital audio playback device." 
In particular, the proposed APA-Gibbs-Bahl combination fails to disclose, teach, or suggest a "user 
interface application program" in a "processing system" that controls a "digital audio playback 
device" and a "reverse DAPD API" enables the digital audio playback device to control a "user 
interface" associated with the "user interface application program." At most, the proposed APA- 
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Gibbs-Bahl combination discloses uni-directional control of an end user device. 

For these reasons, the proposed APA-Gibbs-Bahl combination fails to disclose, teach, or 
suggest all elements of Claims 7 and 13. As a result, the Examiner fails to establish a prima facie 
case of obviousness against Claims 7 and 13 (and their dependent claims). 

Accordingly, the Appellant respectfully requests that the § 103 rejection of Claims 1-19 be 
withdrawn and that Claims 1-19 be passed to allowance. 

11. GROUND OF REJECTION #2 

The rejection of Claims 20-24 under 35 U.S.C. § 103(a) is improper and should be 
withdrawn. 

A. OVERVIEW 

Claims 20-24 stand rejected under 35 U.S.C. § 103(a) as being unpatentable ovqx APA, 
Gibbs, mABahl in farther view of U.S. Patent No. 5,751,962 to Fanshier et al. CFanshier''), 

B. CLAIMS 20-24 

Claim 20 recites computer-executable instructions stored on a removable storage medium 

readable by a processing system capable of being connected to a digital audio playback device. The 

computer-executable instructions comprise a method of displaying information on a monitor screen 

of the coimected processing system, where the method includes: 

executing in the connected processing system a user interface 

-12- 
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application program that accesses and controls the digital audio 
playback device; and 

executing a reverse DAPD application programming interface 
(API), wherein the step of executing the reverse DAPD API enables 
the digital audio playback device to access and control a user 
interface associated with the user interface application program and 
displayed on a monitor screen associated with the connected 
processing system. 

The Examiner fails to establish that the proposed APA-Gibbs-Bahl-Fanshier combination 
discloses, teaches, or suggests all elements of Claim 20. In particular, the Examiner fails to estabhsh 
that the proposed APA-Gibbs-Bahl-Fanshier combination discloses, teaches, or suggests bi- 
directional control between a "processing system" and a "digital audio playback device" as recited in 
Claim 20. 

More specifically, Claim 20 recites a "user interface appUcation program" in a "processing 
system" that controls a "digital audio playback device." Claim 20 also recites a "reverse DAPD 
application programming interface (API)" that enables the digital audio playback device to control a 
"user interface" associated with the "user interface application program." 

As described above, the proposed APA-Gibbs-Bahl combination fails to disclose, teach, or 
suggest bi-directional control between a "processing system" and a "digital audio playback device." 
In particular, the proposed APA-Gibbs-Bahl combination fails to disclose, teach, or suggest a "user 
interface application program" in a "processing system" that controls a "digital audio playback 
device" and a "reverse DAPD API" that enables the digital audio playback device to control a "user 
interface" associated with the "user interface application program." 

The Examiner cites Fanshier only as disclosing instructions stored on a removable storage 
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medium. {09/21/04 Office Action, Page 6, Last paragraph). The Examiner does not rely upon 
Fanshier as disclosing, teaching, or suggesting bi-directional control between a processing system 
and a digital audio playback device. 

For these reasons, the proposed APA-Gibbs-Bahl-Fanshier combination fails to disclose, 
teach, or suggest all elements of Claim 20. As a result, the Examiner fails to establish a prima facie 
case of obviousness against Claim 20 (and its dependent claims). 

Accordingly, the Appellant respectfully requests that the § 103 rejection of Claims 20-24 be 
withdrawn and that Claims 20-24 be passed to allowance. 
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SUMMARY 



The Appellant has demonstrated that the present invention as claimed is clearly 
distinguishable over the prior art cited of record. Therefore, the Appellant respectfully requests the 
Board of Patent Appeals and Interferences to reverse the final rejection of the Examiner and instruct 
the Examiner to issue a notice of allowance of all claims. 

The Appellant has enclosed the appropriate fee to cover the cost of this Appeal Brief. The 
Appellant does not believe that any additional fees are due. Hov^ever, the Commissioner is hereby 
authorized to charge any additional fees (including any extension of time fees) or credit any 
overpayments to Davis Munck Deposit Account No. 50-0208. 



P.O. Drawer 800889 

Dallas, Texas 75380 

(972) 628-3600 (main number) 

(972) 628-3616 (fax) 

E-mail: wmunck@davismunck.com 



Respectfully submitted, 



Davis Munck, P.C. 





William A. Munck 
Registration No. 39,308 
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APPENDIX A 
PENDING CLAIMS 

1 . A digital audio playback device (DAPD) comprising: 

an external interface capable of being coupled to a connected processing system, said 
connected processing system capable of executing a user interface application program that accesses 
and controls said digital audio playback device via said external interface; 

a memory coupled to said external interface capable of storing a reverse DAPD application 
programming interface (API); and 

a processor coupled to said memory and said external interface and capable of executing said 
reverse DAPD API, said reverse DAPD API capable of causing said processor to access and control 
a user interface associated with said user interface application program and displayed on a monitor 
screen associated with said connected processing system. 

2. The digital audio playback device as set forth in Claim 1 wherein said reverse DAPD 
API comprises executable instructions capable of communicating with and controlling an operation 
of said user interface application program. 

3 . The digital audio playback device as set forth in Claim 1 wherein said reverse DAPD 
API comprises data associated with a manufacturer of said digital audio playback device. 

4. The digital audio playback device as set forth in Claim 3 wherein said reverse DAPD 
API is capable of causing said processor to access and control at least a portion of said user interface 
to display said data in said at least a portion of said user interface displayed on said monitor screen. 

5 . The digital audio playback device as set forth in Claim 4 wherein said data comprises 
a graphics file comprising a logo image associated with said manufacturer. 

6. The digital audio playback device as set forth in Claim 4 wherein said data comprises 
a Universal Resource Locator (URL) associated with an Internet web site associated with said 
manufacturer. 
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7. A processing system comprising: 

an external interface capable of being coupled to a connected digital audio playback device, 
said connected digital audio playback device capable of playing audio files stored in said digital 
audio playback device; 

a memory coupled to said external interface capable of storing a user interface application 
program that accesses and controls said digital audio playback device via said external interface and 
capable of storing a reverse DAPD application programming interface (API); and 

a processor coupled to said memory and said external interface and capable of executing said 
user interface application program and said reverse DAPD API, said reverse DAPD API capable of 
communicating with said digital audio playback device and enabling said digital audio playback 
device to access and control a user interface associated with said user interface application program 
and displayed on a monitor screen associated with said processing system. 

8. The processing system as set forth in Claim 7 wherein said reverse DAPD API 
comprises executable instructions capable of communicating with and controlling an operation of 
said user interface apphcation program. 

9. The processing system as set forth in Claim 7 wherein said reverse DAPD API 
comprises data associated with a manufacturer of said digital audio playback device. 

10. The processing system as set forth in Claim 9 wherein said reverse DAPD API is 
capable of enabling said digital audio playback device to access and control at least a portion of said 
user interface to display said data in said at least a portion of said user interface displayed on said 
monitor screen. 

1 1 . The processing system as set forth in Claim 1 0 wherein said data comprises a graphics 
file comprising a logo image associated with said manufacturer. 

12. The processing system as set forth in Claim 10 wherein said data comprises a 
Universal Resource Locator (URL) associated with an Internet web site associated with said 
manufacturer. 
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1 3 . For use in association with a digital audio playback device (D APD) and a processing 
system capable of being connected to the digital audio playback device, a method of displaying 
information on a monitor screen of the connected processing system, the method comprising the 
steps of: 

executing in the connected processing system a user interface application program that 
accesses and controls the digital audio playback device; and 

executing a reverse DAPD application progranaming interface (API), wherein the step of 
executing the reverse DAPD API enables the digital audio playback device to access and control a 
user interface associated with the user interface application program and displayed on a monitor 
screen associated with the connected processing system. 

14. The method as set forth in Claim 13 wherein the reverse DAPD API comprises 
executable instructions capable of communicating with and controlling an operation of the user 
interface application program. 

1 5 . The method as set forth in Claim 1 3 wherein the reverse DAPD API comprises data 
associated with a manufacturer of the digital audio playback device. 

1 6. The method as set forth in Claim 1 5 wherein the step of executing the reverse DAPD 
API comprises the substep of accessing and controlling at least a portion of the user interface 
displayed on the monitor screen. 

1 7. The method as set forth in Claim 1 6 wherein the step of executing the reverse DAPD 
API comprises the substep of displaying the data in the at least a portion of the user interface. 

18. The method as set forth in Claim 17 wherein the data comprises a graphics file 
comprising a logo image associated with the manufacturer. 

1 9. The method as set forth in Claim 1 7 wherein the data comprises a Universal Resource 
Locator (URL) associated with an Internet web site associated with the manufacturer. 
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20. For use in association with a digital audio playback device (D APD) and a processing 
system capable of being connected to the digital audio playback device, computer-executable 
instructions stored on a removable storage medium readable by said processing system, the 
computer-executable instructions comprising a method of displaying information on a monitor 
screen of the connected processing system, the method comprising the steps of: 

executing in the connected processing system a user interface application program that 
accesses and controls the digital audio playback device; and 

executing a reverse DAPD application programming interface (API), wherein the step of 
executing the reverse DAPD API enables the digital audio playback device to access and control a 
user interface associated with the user interface application program and displayed on a monitor 
screen associated with the connected processing system. 

2 1 . The computer-executable instructions stored on a removable storage medium as set 
forth in Claim 20 wherein the reverse DAPD API comprises executable instructions capable of 
communicating with and controlling an operation of the user interface application program. 

22. The computer-executable instructions stored on a removable storage medium as set 
forth in Claim 20 wherein the reverse DAPD API comprises data associated with a manufacturer of 
the digital audio playback device. 

23. The computer-executable instructions stored on a removable storage medium as set 
forth in Claim 22 wherein the step of executing the reverse DAPD API comprises the substep of 
accessing and controlling at least a portion of the user interface displayed on the monitor screen. 

24. The computer-executable instructions stored on a removable storage medium as set 
forth in Claim 23 wherein the step of executing the reverse DAPD API comprises the substep of 
displaying the data in the at least a portion of the user interface. 
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Software-only 
Compression, 
Rendering, and 
Playback of Digital Video 

Software-only digital video involves the com- 
pression, decompression, rendering, and display 
of digital video on general-purpose computers 
without specialized hardware. Today's faster 
processors are making software-only video an 
attractive, low-cost alternative to hardware 
solutions that rely on specialized compression 
boards and graphics accelerators. This paper 
describes the building blocks behind popular 
ISO, ITU-T, and industry-standard compression 
schemes, along with some novel algorithms 
for fast video rendering and presentation. A 
platform-independent software architecture 
that organizes the functionality of compressors 
and renderers into a unifying software inter- 
face is presented. This architecture has been 
successfully implemented on the Digital UNIX, 
the OpenVMS, and Microsoft's Windows NT 
operating systems. To maximize the perfor- 
mance of codecs and renderers, issues pertain- 
ing to flow control, optimal use of available 
resources, and optimizations at the algorithmic, 
operating-system, and processor levels are con- 
sidered. The performance of these codecs on 
Alpha systems is evaluated, and the ensuing 
results validate the potential of software-only 
solutions. Finally, this paper provides a brief 
description of some sample applications built 
on top of the software architecture, including 
an innovative video screen saver and a software 
VCR capable of playing multiple, compressed 
bit streams. 



I 

Paramvir Bahl 
Paul S. Gauthier 
Robert A. Ulichney 



Full-motion video is fast becoming commonplace to 
users of desktop computers. The rising expectations for 
low-cost, television-quality video with synchronized 
sound have been pushing manufacturers to create new, 
inexpensive, high-quality offerings. The bottienecks 
that have been preventing the delivery of video without 
specialized hardware are being cast aside rapidly as 
faster processors, higher- bandwidth computer buses 
and networks, and larger and faster disk drives are 
being developed. As a consequence, considerable 
attention is currentiy being focused on efficient imple- 
mentations of flexible and extensible software solutions 
to the problems of video management and delivery. 
This paper surveys the methods and architectures used 
in software-only digital video systems. 

Due to the enormous amounts of data involved, 
compression is almost always used in the storage and 
transmission of video. The high level of information 
redundancy in video lends itself well to compression, 
and many methods have been developed to take 
advantage of this fact. While the literature is replete 
with compression methods, we focus on those that are 
recognized as standards, a requirement for open and 
interoperable systems. This paper describes the build- 
ing blocks behind popular compression schemes of 
the International Organization for Standardization 
(ISO), the International Telecommunication Union- 
Telecommunication Standardization Sector (ITU-T), 
and within the industry. 

Rendering is another enabling technology for video 
on the desktop. It is the process of scaling, color 
adjusting, quantization, and color space conversion of 
the video for final presentation on the display. As an 
example. Figure 1 shows a simple sequence of video 
decoding. In the section Video Presentation, we dis- 
cuss rendering methods, along with some novel algo- 
rithms for fast video rendering and presentation, and 
describe an implementation that parallels the tech- 
niques used in Digital's hardware video offerings. 

We follow that discussion with the section The 
Software Video Library, in which we present a com- 
mon architecture for video compression, decom- 
pression, and playback that allows integration into 
Digital's multimedia products. We then describe two 
sample applications, the Video Odyssey screen saver 



Digital Technical Journal 



Vol. 7 No. 4 1995 



PRESENTATION 



COMPRESSED 
BIT STREAM 



DECOMPRESS 



RENDER 




DISPLAY 





Figure 1 

Components in a Video Decoder Pipeline 



and a software-only video player. We conclude our 
paper by surveying related work in this rapidly evolv- 
ing area of software digital video. 

Video Compression Methods 

A system that compresses and decompresses video, 
whether implemented in hardware or software, is 
called a video codec (for compressor/decompressor). 
Most video codecs consist of a sequence of compo- 
nents usually connected in pipeline fashion. The codec 
designer chooses specific components based on the 
design goals. By choosing the appropriate set of build- 
ing blocks, a codec can be optimized for speed of 
decompression, reliability of transmission, better color 
reproduction, better edge retention, or to perform at 
a specific target bit rate. For example, a codec could 
be designed to trade off color quality for transmission 
bit rate by removing most of the color information 
in the data (color subsampling). Similarly a codec may 
include a simple decompression model (less process- 
ing per pixel) and a complex compression process to 
boost the playback rate at the expense of longer com- 
pression times. (Compression algorithms that take 
longer to compress than to decompress are said to be 
asymmetric.) Once the components and trade-offs 
have been chosen, the designer then fine tunes the 
codec to perform well in a specific application space 
such as teleconferencing or video browsing. 

Video Codec Building Blocks 

In this section, we present the various building blocks 
behind some popular and industry-standard video 
codecs. Knowledge of the following video codec 
components is essential for understanding the com- 
pression process and to appreciate the complexity of 
the algorithms. 

Chrominance Subsampling Video is usually described 
as being composed of a sequence of images. Each 
image is a matrix of pixels, and each pixel is repre- 
sented by three 8 -bit values: a single luminance value 
(Y) that signifies brightness, and two chrominance val- 
ues (U and V, or sometimes Cb and Cr) which, taken 
together, specify a unique color. By reducing the 
amount of color information in relation to luminance 
(subsampling the chrominance), we can reduce the 
size of an image with littie or no perceptual effect. The 



most common chrominance subsampling technique 
decimates the color signal by 2:1 in the horizontal 
direction. This is done either by simply throwing out 
the color information of alternate pixels or by averag- 
ing the colors of two adjacent pixels and using the 
average for the color of the pixel pair. This technique is 
commonly referred to as 4:2:2 subsampling. When 
compared to a raw 24-bit image, this results in a com- 
pression of two-thirds. Decimating the color signal by 
2:1 in both the horizontal and the vertical direction 
(by ignoring color information for alternate lines in 
the image) starts to result in some perceptible loss of 
color, but the compression increases to one-half. This 
is referred to as 4:2:0 subsampling: for every 4 lumi- 
nance samples, there is a single color specified by a pair 
of chrominance values. The ultimate chrominance 
subsampling is to throw away all color information 
and keep only the luminance data (monochrome 
video). This not only reduces the size of the input data 
but also gready simplifies processing for both the com- 
pressor and the decompressor, resulting in faster codec 
performance. Some teleconferencing systems allow 
the user to switch to monochrome mode to increase 
frame rate. 

Transform Coding Converting a signal, video or 
otherwise, from one representation to another is the 
task of a transform coder. Transforms can be usefril for 
video compression if they can convert the pixel data 
into a form in which redundant and insignificant infor- 
mation in the video's image can be isolated and 
removed. Many transforms convert the spatial (pixel) 
data into frequency coefficients that can then be selec- 
tively eliminated or quantized. Transform coders 
address three central issues in image coding: ( 1 ) decor- 
relation (converting statistically dependent image 
elements into independent spectral coefficients), 

(2) energy compaction (redistribution and localization 
of energy into a small number of coefficients), and 

(3) computational complexity. It is well documented 
that human vision is biased toward low frequencies. 
By transforming an image to the frequency domain, 
a codec can capitalize on this knowledge and remove 
or reduce the high-frequency components in the 
quantization step, effectively compressing the image. 
In addition, isolating and eliminating high-frequency 
components in an image results in noise reduc- 
tion since most noise in video, introduced during 
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the digitization step or from transmission interfer- 
ence, appears as high-frequency coefficients. Thus 
transforming helps compression by decorrelating (or 
whitening) signal samples and then discarding 
nonessential information from the image. 

Unitary (or orthonormal) transforms fall into either 
of two classes: fixed or adaptive. Fixed transforms are 
independent of the input signal; adaptive transforms 
adapt to the input signal.^ Examples of fixed trans- 
forms include the discrete Fourier transform (DFT), 
the discrete cosine transform (DCT), the discrete sine 
ti-ansform (DST), die Harr transform, and the Walsh- 
Hadamard transform (WHT). An example of an 
adaptive transform is the Karhunen-Loeve transform 
(KLT). Thus far, no transform has been found for 
pictorial information that completely removes statisti- 
cal dependence between the transform coordinates. 
The KLT is optimum in the mean square error sense, 
and it achieves the best energy compaction; however, 
it is computationally very expensive. The WHT is the 
best in terms of computation cost since it requires only 
additions and subtractions; however, it performs 
poorly in decorrelation and energy compaction. 
A good compromise is the DCT, which is by far 
the most widely used transform in image coding. The 
DCT is closest to the KLT in the energy-packing sense, 
and, like the DFT, it has fast computation algorithms 
available for its implementation.^ The DCT is usually 
applied in a sliding window on the image with a com- 
mon window size of 8 pixels by 8 lines (or simply, 8 by 
8). The window size (or block size) is important: if 
it is too small, the correlation between neighboring 
pixels is not exploited; if it is too large, block bound- 
aries tend to become very visible. Transform coding 
is usually the most time-consuming step in the 
compression /decompression process . 

Scalar Quantization A companion to transform cod- 
ing in most video compression schemes is a scalar 
quantizer that maps a large number of input levels into 
a smaller number of output levels. Video is com- 
pressed by reducing the number of symbols that need 
to be encoded at the expense of reconstruction error. 
A quantizer acts as a control knob that trades off 
image quality for bit rate. A careftiUy designed quan- 
tizer provides high compression for a given quality. 
The simplest form of a scalar quantizer is a uniform 
quantizer in which the quantizer decision levels are of 
equal length or step size. Other important quantizers 
include Lloyd-Max's minimum mean square error 
(MMSE) quantizer and an entropy constraint quan- 
tizer.^* Pulse code modulation (PCM) and adaptive 
differential pulse code modulation (ADPCM) are 
examples of two compression schemes that rely on 
pure quantization without regard to spatial and tem- 
poral redundancies and without exploiting the non- 
linearity in the human visual system. 



Predictive Coding Unless the image is changing 
rapidly, a video sequence will normally contain 
sequences of frames that are very similar. Predictive 
coding uses this fact to reduce the data volume by 
comparing pixels in the current frame with pixels in 
the same location in the previous frame and encoding 
the difference. A simple form of predictive coding uses 
the value of a pixel in one frame to predict the value of 
the pixel in the same location in the next frame. The 
prediction error, which is the difference between 
the predicted value and the actual value of the pixel, is 
usually small. Smaller numbers can be encoded using 
fewer quantization levels and fewer coding bits. Often 
the difference is zero, which can be encoded very 
compacdy. Predictive coding can also be used within 
an image frame where the predicted value of a pixel 
may be the value of its neighbor or a weighted average 
of the pixels in the region. Predictive coding works 
best if the correlation between adjacent pixels that are 
spatially as well as temporally close to each other is 
strong. Differential PCM and delta modulation (DM) 
are examples of two compression schemes in which 
the predicted error is quantized and coded. The 
decompressor recovers the signal by applying this 
error to its predicted value for the sample. Lossless 
image compression is possible if the prediction error 
is coded without being quantized. 

Vector Quantization An alternative to transform- 
based coding, vector quantization attempts to repre- 
sent clusters of pixel data (vectors) in the spatial 
domain by predetermined codes.^ At the encoder, 
each data vector is matched or approximated with a 
code word in the codebook, and the address or index 
of that code word is transmitted instead of the data 
vector itself At the decoder, the index is mapped back 
to the code word, which is then used to represent the 
original data vector. Identical codebooks are needed at 
the compressor (transmitter) and the decompressor 
(receiver). The main complexity lies in the design of 
good representative codebooks and algorithms for 
finding best matches efficientiy when exact matches 
are not available. Typically, vector quantization is 
applied to data that has already undergone predictive 
coding. The prediction error is mapped to a subset 
of values that are expected to occur most frequendy. 
The process is called vector quantization because the 
values to be matched in the tables are usually vectors of 
two or more values. More elaborate vector quantiza- 
tion schemes are possible in which the difference data 
is searched for larger groups of commonly occurring 
values, and these groups are also mapped to single 
index values. 

The amount of compression that results from vec- 
tor quantization depends on how the values in the 
codebooks are calculated. Compression may be 
adjusted smoothly by designing a set of codebooks 
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and picking the appropriate one for a given desired 
compression ratio. 

Motion Estimation and Compensation Most codecs 
that use interframe compression use a more elaborate 
form of predictive coding than described above. Most 
videos contain scenes in which one or more objects 
move across the image against a fixed background or 
in which an object is stationary against a moving back- 
ground. In both cases, many regions in a frame appear 
in the next frame but at different positions. Motion 
estimation tries to find similar regions in two frames 
and encodes the region in the second frame with a dis- 
placement vector (motion vector) that shows how 
the region has moved. The technique relies on the 
hypothesis that a change in pixel intensity from one 
frame to another is due only to translation. 

For each region (or block) in the current frame, 
a displacement vector is evaluated by matching the 
information content of the measurement window with 
a corresponding measurement window W within 
a search area S, placed in the previous frame, and by 
searching for the spatial location that minimizes the 
matching criterion 3. Let Li(x,y) represent the pixel 
intensity at location {x,y) in frame i; and if (d^.dy) rep- 
resents the region displacement vector for the interval 
w(=( ?+ n)—i), then the matching criterion is defined as 

d = min < ^ Z,.(:»£:,j) 

(1) 

-Z,_„(;c-^,,j;-^,)||| n>l 

The most widely used distance measures are the 
absolute value ||.x|| = |x| and the quadratic norm 
||;c||=.x:^ Since finding the absolute minimum is guar- 
anteed only by performing an exhaustive search of a 
series of discrete candidate displacements within 
a maximum displacement range, this process is com- 
putationally very expensive. A single displacement 
vector is assigned to all pixels within the region. 

Motion compensation is the inverse process of using 
a motion vector to determine a region of the image to 
be used as a predictor. 

Although the amount of compression resulting 
from motion estimation is large, the coding process is 
time-consuming. Fortunately, this time is needed only 
in the compression step. Decompression using motion 
estimation is relatively fast since no searching has to be 
done. For data replenishment, the decompressor sim- 
ply uses the transmitted vector and accesses a region in 
the previous frame pointed to by the vector for data 
replenishment. Region size can vary among the codecs 
using motion estimation but is typically 16 by 16. 



Frame/Block Skipping One technique for reducing 
data is to eliminate it entirely. In a teleconferencing sit- 
uation, for example, if the scene does not change 
(above some threshold criteria), it may be acceptable 
to not send the new frame (drop or skip the frame). 
Alternatively, if bandwidth is hmited and image quality 
is important, it may be necessary to drop frames to stay 
within a bit-rate budget. Most codecs used in telecon- 
ferencing applications have the ability of temporal sub- 
sampling and are able to gracefully degrade under 
limited bandwidth situations by dropping frames. 

A second form of data elimination is spatial subsam- 
pling. The idea is similar to chrominance subsampling 
discussed previously. In most transform -based codecs, 
a block (8 by 8 or 16 by 16) is usually skipped if the 
difference between it and the previous block is below 
a predetermined threshold. The decompressor may 
reconstruct the missing pixels by using the previous 
block to predict the current block. 

Entropy Encoding Entropy encoding is a form of sta- 
tistical coding that provides lossless compression by 
coding input samples according to their frequency of 
occurrence. The two methods used most frequentiy 
include Huffman coding and run-length encoding.^ 
Huffman coding assigns fewer bits to most frequentiy 
occurring symbols and more bits to the symbols that 
appear less often. Optimal Huffman tables can be gen- 
erated if the source statistics are known. Calculating 
these statistics, however, slows down the compression 
process. Consequendy, predeveloped tables that have 
been tested over a wide range of source images are 
used. A second and simpler method of entropy encod- 
ing is run-length encoding in which sequences of 
identical digits are replaced with the digit and the 
number in the sequence. Like motion estimation, 
entropy encoding puts a heavier burden on the com- 
pressor than the decompressor. 

Before ending this section, we would like to mention 
that a number of otiier techniques, including object- 
based coding, model-based coding, segmentation- 
based coding, contour-texture oriented coding, fractal 
coding, and wavelet coding are also available to the 
codec designer. Thus far, our coverage has concen- 
trated on explaining only those techniques that have 
been used in the video compression schemes currentiy 
supported by Digital. In the next section, we describe 
some hybrid schemes that employ a number of the 
techniques described above; these schemes are the basis 
of several international video coding standards. 

Overview of Popular Video Compression Schemes 

The compression schemes presented in this section 
can be collectively classified as first- generation video 
coding schemes.^ The common assumption in all these 
methods is that there is statistical correlation between 
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pixels. Each of these methods attempts to exploit this 
correlation by employing redundancy reduction tech- 
niques to achieve compression. 

Motion-JPEG Algorithm Motion-JPEG (or M-JPEG) 
compresses each frame of a video sequence using the 
ISO's Joint Photographic Experts Group (JPEG) 
continuous-tone, still-image compression standard.^ 
As such, it is an intraframe compression scheme. It is 
not wed to any particular subsampling format, image 
color space, or image dimensions, but most typically 
4:2:2 subsampled YCbCr, source input format (SIF, 
352 by 240) data is used. The JPEG standard specifies 
both lossy and lossless compression schemes. For 
video, only the lossy baseline DCT coding scheme has 
gained acceptance. The scheme relies on selective 
quantization of the frequency coefficients follow^ed by 
Huffman and run-length encoding for its compres- 
sion. The standard defines a bit-stream format that 
contains both the compressed data stream and coding 
parameters such as the number of components, quan- 
tization tables, Huffman tables, and sampling factors. 
Popular M-JPEG file formats usually build on top of 
the JPEG-specified formats with litde or no modifica- 
tion. For example, Microsoft's audio-video interleaved 
(AVI) format encapsulates each JPEG frame with its 
associated audio and adds an index to the start of each 
frame at the end of the file. Video editing on a frame- 
by-fi-ame basis is possible with this format. Another 
advantage is frame-limited error propagation in net- 
worked, distributed applications. Many video digitizer 
boards incorporate JPEG compression in hardware to 
compress and decompress video in real time. Digital's 
Sound & Motion J300 and FullVideo Supreme JPEG 
are two such boards.''^** The baseline JPEG codec is a 
symmetric algorithm as may be seen in Figure 2a and 
Figure 3. 

ITU-T's Recommendation H.261 The ITU-T's Recom- 
mendation H.261 is a motion-compensated, DCT- 
based video coding standard.^ ^ Designed for the 
teleconferencing market and developed primarily for 
low-bit-rate Integrated Services Digital Network 
(ISDN) services, H.261 shares similarities with ISO's 
JPEG still-image compression standard. The target bit 
rate is p X 64 kilobits per second with p ranging 
between 1 and 30 (H.261 is also known as p X 64). 
Only two frame resolutions, common intermediate 
format (GIF, 352 by 288) and quarter-CIF (QCIF, 
176 by 144), are allowed. All standard-compliant 
codecs must be able to operate with QCIF; GIF is 
optional. The input color space is fixed by the 
International Radio Consultative Committee (CCIR) 
601 YCbCr standard's with 4:2:0 subsampling (sub- 
sampling of chrominance components by 2:1 in both 
the horizontal and the vertical direction). Two types 
of frames are defined: key frames that are coded 



independendy and non-key frames that are coded 
with respect to a previous frame. Key frames are 
coded in a manner similar to JPEG. For non-key 
frames, block-based motion compensation is per- 
formed to compute interframe differences, which are 
then DCT coded and quantized. The block size is 
16 by 16, and each block can have a different quanti- 
zation table. Finally, a variable word-length encoder 
(usually employing Huffman and run-length methods) 
is used for coding the quantized coefficients. Rate 
control is done by dropping frames, skipping blocks, 
and increasing quantization. Error correction codes 
are embedded in the bit stream to help detect and 
possibly correct transmission errors. Figure 2b shows 
a block diagram of an H.261 decompressor. 

ISO's MPEG-1 Video Standard The MPEG-1 video 
standard was developed by ISO's Motion Picture 
Experts Group (MPEG). Like the H.261 algorithm, 
MPEG-1 is also an interframe video codec that 
removes spatial redundancy by compressing key 
frames using techniques similar to JPEG and removes 
temporal redundancy through motion estimation and 
compensation.^* *^ The standard defines three different 
types of frames or pictures: intra or I- frames that are 
compressed independendy; predictive or P-frames 
that use motion compensation from the previous I- 
or P- frame; and bidirectional or B-fi-ames that contain 
blocks predicted fi-om either a preceding or following 
P- or I-fi-ame (or interpolated from both). Compres- 
sion is greatest for B- frames and least for I -frames. 
(A fourth type of frame, called the D-firame or the 
DC-intracoded frame, is also defined for improving 
fast-forward-type access, but it is hardly ever used.) 
There is no restriction on the input firame dimensions, 
though the target bit rate of 1 .5 megabits per second is 
for video containing SIF frames. Subsampling is fixed 
at 4:2:0. MPEG-1 employs adaptive quantization of 
DCT coefficients for compressing I-frames and for 
compressing the difference between actual and pre- 
dicted blocks in P- and B -frames. A 16- by- 16 sliding 
window, called a macroblock, is used in motion esti- 
mation; and a variable word-length encoder is used in 
the final step to ftirther lower the output bit rate. The 
full MPEG-1 standard specifies a system stream that 
includes a video and an audio substrcam, along with 
timing information needed for synchronization 
between the two. The video substream contains the 
compressed video data and coding parameters such 
as picture rate, bit rate, and image size. MPEG-1 has 
become increasingly popular primarily because it 
offers better compression than JPEG without compro- 
mising on quality. Several vendors and chip manu- 
facturers offer specialized hardware for MPEG 
compression and decompression. Figure 2c shows 
a block diagram of an MPEG-1 video decompressor. 
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Intel's INDEO Video Compression Algorithm Intel's 
proprietary INDEO video compression algorithm is 
used primarily for video presentations on personal 
computer (PC) desktops. It employs color subsam- 
pling, pixel differencing, run-length encoding, vector 
quantization, and variable w^ord-length encoding. The 
chrominance components are heavily subsampled. For 
every block of 4-by-4 luminance samples, there is 
a single sample of Cb and Cr. Furthermore, samples 
are shifted one bit to convert them to 7-bit values. The 
resulting precompression format is called YVTJ9, 
because on average there are 9 bits per pixel. This 
subsampling alone yields a reduction of 9/24. Run- 
length encoding is employed to encode any run of 
zero pixel differences. 

PCWG's INDEO-C Video Compression Algorithm 

INDEO -C is the video compression component of a 
teleconferencing system derived from the Personal 
Conferencing Specification developed by the Personal 
Conferencing Work Group (PCWG), an industry 
group led by Intel Corporation. Like the MPEG stan- 
dard, the PCWG specification defines the compressed 
bit stream and the decoder but not the encoder. 
INDEO-C is optimized for low-bit-rate, ISDN-based 
connections and, unlike its desktop compression 
cousin, is transform- based. It is an interframe algo- 
rithm that uses motion estimation and a 4:1 chromi- 
nance subsampling in both directions. Spatial and 
temporal loop filters are used to remove high- 
frequency artifacts. The transform used for converting 
spatial data to frequency coefficients is the slant trans- 
form, which has the advantage of requiring only shifts 
and adds with no multiplies. Like the DCT, the fast 
slant transform (FST) is applied on image subblocks 
for coding both intraframes and difference firames. As 
was the case in other codecs, run-length coding and 
Huffman coding are employed in the final step. 
Compression and decompression of video in software 
is faster than other inter fi-ame schemes like MPEG-1 
andH.26L 

Compression Schemes under Development In addi- 
tion to the five compression schemes described in this 
section, four other video compression standards, 
which are currently in various stages of development 
within ISO and ITU-T, are worth mentioning: ISO's 
MPEG-2, ITU-T's Recommendation H.262, ITU-T's 
Recommendation H.263, and ISO's MPEG-4.'^*^* 
Although the techniques employed in MPEG-2, 
H.262, and H.263 compression schemes are similar to 



the ones discussed above, the target applications are 
different. H.263 focuses on providing low-bit-rate 
video (below 64 kilobits per second) that can be trans- 
mitted over narrowband channels and used for real- 
time conversational services. The codec would be 
employed over the plain old telephone system (POTS) 
with modems that have the V.32 and the V.34 modem 
technologies. MPEG-2, on the other hand, is aimed at 
bit rates above 2 megabits per second, which support 
a wide variety of formats for multimedia applications 
that require better quality than MPEG-1 can achieve. 
One of the more popular target applications for 
MPEG-2 is coding for high-definition television 
(HDTV). It is expected tiiat ITU-T will adapt MPEG-2 
so that Recommendation H.262 will be very similar, 
if not identical, to it. Finally, like Recommendation 
H.263, ISO's MPEG-4's charter is to develop a generic 
video coding algorithm for low-bit-rate multimedia 
applications over a public switched telephone network 
(PSTN). A wide variety of applications, including 
those operating over error- prone radio channels, are 
being targeted. The standard is expected to embrace 
coding methods that are very different from its precur- 
sors and will include the so-called second-generation 
coding techniques.^ MPEG-4 is expected to reach 
draft stage by November 1997. 

This ends our discussion on video compression tech- 
niques and standards. In the next section, we turn our 
attention to the other component of the video play- 
back solution, namely video rendering. We describe the 
general process of video rendering and present a novel 
algorithm for efficient mapping of out- of- range colors 
to feasible red, green, and blue (RGB) values that can 
be represented on the target display device. Out-of- 
range colors can occur when the display quality is 
adjusted during video playback. 

Video Presentation 

Video presentation or rendering is the second impor- 
tant component in the video playback pipeline (see 
Figure 1). The job of this subsystem is to accept 
decompressed video data and present it in a window of 
specified size on the display device using a specified 
number of colors. The basic components are sketched 
in Figure 4 and described in more detail in a previous 
issue of this Journal}^ Today, most desktop systems do 
not include hardware options to perform these steps, 
but some interesting cases are available as described in 
this issue .^-'^ When such accelerators are not available, 
software-only implementation is necessary. Sofbvare 
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rendering algorithms, although very efficient, can still 
consume as many computation cycles as are used to 
decompress the data. 

All major video standards represent image data in a 
luminance-chrominance color space. In this scheme, 
each pixel is composed of a single luminance compo- 
nent, denoted as Y, and two chrominance components 
that are sometimes referred to as color difference sig- 
nals Cb and Cr, or signals U and V. The relationship 
between the familiar RGB color space and YUV can be 
described by a 3 -by- 3 linear transformation: 



(2) 
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b 
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where the transformation matrix, 



M = 



10a 
I b c 
\ d 0 



(3) 



The matrix is somewhat simple with only four values 
that are not 0 or 1, These constants are a - 1,402, 
b= -.344, c= -.714,andt^= 1.722, 

The RGB color space cube becomes a parallelepiped 
in YUV space. This is pictured in Figure 5, where the 
black corner is at the bottom, and the white corner is 
at the top; the red, green, and blue corners are as 
labeled. The chrominance signals U and V are usually 
subsampled, so the rendering subsystem must first 
restore these components and then transform the 
YUV triplets to RGB values. 

Typical frame buffers are configured with 8 bits of 
color depth. This hardware colormap must, in general, 
be shared by multiple applications, which puts a pre- 
mium on each of the 256 color slots in the map. Each 
application, therefore, must be able to request render- 
ing to a limited number of colors. This can be accom- 
plished most effectively with a multilevel dithering 
scheme, as represented by the dither block in Figure 4. 




Figure 5 

The RGB "Cube" in YUV Space 



The color adjustment block controls brightness, con- 
trast and saturation by means of simple look-up tables. 

Along with up-sampling the chrominance, the scale 
block in Figure 4 can also change the size of the 
image. Although arbitrary scaling is best performed in 
combination with filtering, it is found to be too expen- 
sive to do in a sofiware-only implementation. For the 
case of enlargement, a trade-off can be made between 
image quality and speed; contrary to what is shown in 
Figure 4, image enlargement can occur after dithering 
and color space converting. Of course, this would 
result in scaled dithered pixels, which are certainly less 
desirable, but it would also result in faster processing. 

To optimize computational efficiency, color space 
conversion from YUV to RGB takes place after YUV 
dithering. Dithering greatiy reduces the number of 
YUV triplets, thus allowing a single look-up table 
to perform the color space conversion to RGB as well 
as map to the final 8 -bit color index required by the 
graphics display system. Digital pioneered this idea 
and has used it in a number of hardware and sofi:ware- 
only products. 

Mapping Out-of-Range Colors 

Besides the obvious advantages of speed and simplic- 
ity, using a look-up table to convert dithered YUV val- 
ues to RGB values has the added feature of allowing 
careftil mapping of out-of-range YUV values. Refer- 
ring again to Figure 5, the RGB solid describes those 
r, g, and b values that are feasible, that is, have the nor- 
malized range 0<r,g, b<l. The range of possible val- 
ues in YUV space are those for 0 < j< 1 and — ,5 < 
V < .5, It turns out that the RGB solid occupies only 
23.3 percent of this possible YUV space; thus there 
is ample possibility for so-called infeasible or out-of- 
range colors to occur. Truncating the g, and lvalues 
of these colors has the effect of mapping back to the 
RGB parallelepiped along lines perpendicular to its 
nearest surface; this is undesirable since it will result 
in changing both the hue angle or polar orientation in 
the chrominance plane and the luminance value. By 
storing the mapping in a look-up table, decisions can 
be made a priori as to exacdy what values the out-of- 
range values should map to. 

There is a mapping where both the luminance or j; 
value and the hue angle are held constant at the 
expense of a change in saturation. This section details 
how a closed-form solution can be found for such a 
mapping. Figure 6 is a cross section of the volume in 
Figure 5 through a plane at j =^0. The object is to find 
the point on the surface of the RGB parallelepiped that 
maps the out-of-range point (yo, Uq, Vq) in the plane of 
constant Jo (constant luminance) and along a straight 
line to the w-i; origin (constant hue angle). The solu- 
tion is the intersection of the closest RGB surface and 
the line between (jo, Uq, Vq) and (jo, 0, 0). This line can 
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be parametrically represented as the locus (jo, glu^-, glVq) 
for a single parameter a. The RGB values for these 
points are 
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where the matrix M is as given in equation (2). To find 
where this parametric line will intersect the RGB paral- 
lelepiped, we can first solve for the a at the intercept val- 
ues at each of the six bounding surface planes as follows: 
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Exactly three a, will be negative, with each describing 
the intercept with extended RGB surface planes oppo- 
site the M-t; origin. Of the remaining three aj, the two 
largest values will describe intercepts with extended 
RGB surface planes in infeasible RGB space. This is 
because the RGB volume, a parallelepiped, is a convex 
polyhedron. Thus the solution must simply be the 
smallest positive a^. Plugging this value of a into equa- 
tion (4) produces the desired RGB value. 



When we started this project, we had two objectives in 
mind: to showcase the processing power of Digital's 
newly developed Alpha processor and to use this 
power to make digital video easily available to devel- 
opers and end users by providing extremely low-cost 
solutions. We knew that because of the compute- 
intensive nature of video processing. Digital's Alpha 
processor would outperform any competitive proces- 
sor in a head- to-head match. By providing the ability 
to manipulate good-quality desktop video without the 
need for additional hardware, we wanted to make 
Alpha- based systems the computers of choice for end 
users who wanted to incorporate multimedia into 
their applications. 

Our objectives translated to the creation of a soft- 
ware video library that became a reality because of 
three key observations. The first one is embedded in 
our motivation: processors had become powerftil 
enough to perform complex signal-processing opera- 
tions at real-time rates. With the potential of even 
greater speeds in the near ftiture, low- cost multimedia 
solutions would be possible since audio and video 
decompression could be done on the native processor 
without any additional hardware, 

A second observation was that multiple emerging 
audio/video compression standards, both formal and 
industry de facto, were gaining popularity with appli- 
cation vendors and hence needed to be supported 
on Digital's platforms. On careftil examination of the 
compression algorithms, we observed that most of 
the prominent schemes used common building 
blocks (see Figure 2). For example, all five interna- 
tional standards— JPEG, MPEG-1, MPEG-2, H.261, 
and H.263 — have DCT-based transform coders fol- 
lowed by a quantizer. Similarly, all five use Huffman 
coding in their final step. This meant that work done 
on one codec could be reused for others. 

A third observation was that the most common 
component of video-based applications was video 
playback (for example, videoconferencing, video-on- 
demand, video player, and desktop television). The 
output decompressed streams from the various 
decoders have to be software-rendered for display on 
systems that do not have support for color space con- 
version and dithering in their graphics adapters. An 
efficient software rendering scheme could thus be 
shared by all video players. 

With these observations in mind, we developed 
a software video library containing quality implemen- 
tations of ISO, ITU-T, and industry de facto video 
coding standards. In the sections to follow, we present 
the architecture, implementation, optimization, and 
performance of the software video library. We com- 
plete our presentation by describing examples of 
video-based applications written on top of this library. 
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including a novel video screen saver we call Video 
Odyssey and a software -only video player. 

Architecture 

Keeping in mind the observations outlined above, we 
designed a software video library (SLIB) that would 

■ Provide a common architecture under which mul- 
tiple audio and video codecs and renderers could 
be accessed 

■ Be the lowest, functionally complete layer in the 
software video codec hierarchy 

■ Be fast, extensible, and thread-safe, providing reen- 
trant code with minimal overhead 

■ Provide an intuitive, simple, flexible, and extens- 
ible application programming interface (API) 
that supports a client-server model of multimedia 
computing 

■ Provide an API that would accommodate multiple 
upper layers, allowing for easy and seamless integra- 
tion into Digital's multimedia products 

Our intention was not to create a library that would 
be exposed to end-user applications but to create one 
that would provide a common architecture for video 
codecs for easy integration into Digital's multimedia 
products. SLIB's API was purposely designed to be 
a superset of Digital's Multimedia Services' API for 
greater flexibility in terms of algorithmic tuning and 
control. The library would fit well under the actual 
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programming interface provided to end users by 
Digital's Multimedia Services. Digital's Multimedia 
API is the same as Microsoft's Video For Windows 
API, which facilitates the porting of multimedia appli- 
cations from Windows and Windows NT to Digital 
UNIX and OpenVMS platforms. Figure 7 shows SLIB 
in relation to Digital's multimedia software hierarchy. 
The shaded regions indicate the topics discussed in 
this paper. 

As mentioned, the library contains routines for 
audio and video codecs and Digital's propriety video- 
rendering algorithms. The routines are optimized 
both algorithmically and for the particular platform on 
which they are offered. The software has been success- 
fully implemented on multiple platforms, including 
the Digital UNIX, the OpenVMS, and Microsoft's 
Windows NT operating systems. 

Three classes of routines are provided for the three 
subsystems: (1) video compression and decompres- 
sion, (2) video rendering, and (3) audio processing. 
For each subsystem, routines can be further classified 
as (a) setup routines, (b) action routines, (c) query rou- 
tines, and (d) teardown routines. Setup routines create 
and initialize all relevant internal data structures. They 
also compute values for the various look-up tables such 
as the ones used by the rendering subsystem. Action 
routines perform the actual coding, decoding, and ren- 
dering operations. Query routines may be used before 
setup or between action routines. These provide the 
programmer with information about the capability 
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of the codec such as whether or not it can handle a 
particular input format and provide information about 
the bit stream being processed. These routines can also 
be used for gathering statistics. Teardown routines, as 
the name suggests, are used for closing the codec and 
destroying all internal memory (state information) 
associated with it. For all video codecs, SLIB provides 
convenience functions to construct a table of contents 
containing the offsets to the start of frames in the input 
bit stream. These convenience functions are useful for 
short clips: once a table of contents is built, random 
access and other VCR functions can be implemented 
easily. (These routines are discussed further in the sec- 
tion on sample applications.) 

Implementation of Video Codecs 

In this section, we present the program flow for multi- 
media applications that incorporate the various video 
codecs. These applications are built on top of SLIB. 
We also discuss specific calls from the library's API to 
explain concepts. 

Motion JPEG Motion JPEG is the de facto name of 
the compression scheme that uses the JPEG compres- 
sion algorithm developed for still images to code video 
sequences. The motion JPEG (or M-JPEG) player was 
the first decompressor we developed. We had recentiy 
completed the Sound & Motion J300 adapter that 
could perform JPEG compression, decompression, 
and dithering in hardware We now wanted to 
develop a software decoder that would be able to 
decode video sequences produced by the J300 and its 
successor, the FuUVideo Supreme JPEG adapter, 
which uses the peripheral component interconnect 
(PCI).*° Only baseline JPEG compression and decom- 
pression have been implemented in SLIB. This is suffi- 
cient for greater than 90 percent of today's existing 
applications. Figure 2a and Figure 3 show the block 
diagrams for the baseline JPEG codec, and Figure 8 
shows the flow control for compressing raw video 
using the video library routines. Due to the symmetric 
structure of the algorithm, the flow diagram for the 
JPEG decompressor looks very similar to the one for 
the JPEG compressor. 

The amount of compression is controlled by the 
amount of quantization in the individual image frames 
constituting the video sequence. The coefficients for 
every 8-by-8 block within the image F(x,y) are quan- 
tized and dequantized as 



F{x,y) 



QTible(x,y) 
= F,(x,j)XQTable(x,y). 



(5) 



In equation (5), QTable represents the quantization 
matrices, also called visibility matrices, associated 
wi± the frame F{x,y). (Each component constituting 
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the frame can have its own QTable.) SLIB provides 
routines to download QTables to the encoder explic- 
itiy; tables provided in the ISO specification can be 
used as defaults. The library provides a quality factor 
that can scale the base quantization tables, thus pro- 
viding a control knob mechanism for varying the 
amount of compression from frame to frame. The 
quality factor may be dynamically varied between 
0 and 10,000, with a value of 10,000 causing no quan- 
tization (all quantization table elements are equal 
to 1), and a value of 0 resulting in maximum quantiza- 
tion (all quantization table elements are equal to 255). 
For intermediate values: 

QTable(x,:v)= (6) 

/r ^\ 
VisibilityTable{ j) X( 1 0' - Qualfactor)x2SS 



Clip 



10*Xmin \yisibilityTable{x,y) | 



The ClipO function forces the out-of-bounds values to 
be either 255 or 1. At the low end of the quality set- 
ting (small values of the quality factor), the above 
formula produces quantization tables that cause 
noticeable artifacts. 

Although Huffman tables do not affect the quality 
of the video, they do influence the achievable bit rate 
for a given video quality. As with quantization tables, 
SLIB provides routines for loading and using custom 
Huffman tables for compression. Huffman coding 
works best when the source statistics are known; in 
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practice, statistically optimized Huffman tables are 
rarely used due to the computational overhead involved 
in their generation. In the case where these tables are 
not explicitiy provided, the library uses as default the 
baseline tables suggested in the ISO specification. In the 
case of decompression, the tables may be present in the 
compressed bit stream and can be examined by invok- 
ing appropriate query calls. In the AVI format, Huffman 
tables are not present in the compressed bit stream, and 
the default ISO tables are always used. 

Query routines for determining the supported 
input and output formats for a particular compressor 
are also provided. For M-JPEG compression, some of 
the supported input formats include interleaved 4:2:2 
YUV, noninterieaved 4:2:2 YUV, interleaved and non- 
interleaved RGB, 32-bit RGB, and single component 
(monochrome). The supported output formats 
include JPEG- compressed YUV and JPEG-compressed 
single component. 

ISO's MPEG-I Video Once we had implemented the 
M-JPEG codec, we turned our attention to the MPEG-1 
decoder. MPEG-1 is a highly asymmetric algorithm. 
The committee developing this standard purposely 
kept the decompressor simple: it was expected that 
there would be many cases of compress once and 
decompress multiple times. In general, the task of com- 
pression is much more complex than that of decom- 
pression. As of this writing, achieving real-time 
performance for MPEG-1 compression in software 
is not possible. Thus we concentrated our energies 
on implementing and optimizing an MPEG-1 decom- 
pressor while leaving MPEG-1 compression for batch 
mode. Someday we hope to achieve real-time com- 
pression all in software with the Alpha processor. 
Figure 9 illustrates the high-level scheme of howSLIB 
fits into an MPEG player. The MPEG-1 system stream 
is split into its audio and video substreams, and each 
is handled separately by the different components of 



the video library. Synchronization between audio and 
video is achieved at the application layer by using the 
presentation time-stamp information embedded in 
the system stream. A timing controller module within 
the application can adjust the rate at which video 
packets are presented to the SLIB video decoder and 
renderer. It can indicate to the decoder whether to 
skip the decoding of B- and P-frames. 

Figure 10 illustrates the flow control for an MPEG-1 
video player written on top of SLIB. The scheme relies 
on a callback fimction that is registered with the codec 
during initial setup, and a SvAdd Buffers function, writ- 
ten by the client, which provides the codec with the bit- 
stream data to be processed. The codec is primed by 
adding multiple buffers, each typically containing 
a single video packet firom the demultiplexed system 
stream. These buffers are added to the codec's internal 
buffer queue. After enough data has been provided, the 
decoder is told to parse the bit stream in its buffer queue 
until it finds the next (first) picture. The client applica- 
tion can specify which type of picture to locate (I, P, or 
B) by setting a mask bit. After the picture is found and 
its information returned to the client, the client may 
choose to either decompress this picture or to skip it by 
invoking the routine to find the next picture. This pro- 
vides an effective mechanism for rate control and for 
VCR controls such as step forward, fast forward, step 
back, and fast reverse. If the client requests that a 
non-key picture (P or B) be decompressed and the 
codec does not have the required reference (I or P) pic- 
tures needed to perform this operation, an error is 
returned. The client can then choose to abort or pro- 
ceed until the codec finds a picture it can decompress. 

During steady state, the codec may periodically 
invoke the callback function to exchange messages with 
the client application as it compresses or decompresses 
the bit stream. Most messages sent by the codec expect 
some action fi-om the client. For example, one of 
the messages sent by the codec to the appUcation is 
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Flow Control for MPEG-1 Video Playback 



a CB3ND_BUFFERS message, which indicates the 
codec has run out of data and the client needs to either 
add more data buffers or abort the operation. Another 
message, CB_RELEASE_BUFFERS, indicates the 
codec is done processing the bit-stream data in a data 
buffer, and the buffer is available for client reuse. One 
possible action for the client is to fill this newly available 
buffer with more data and pass it back to the codec. In 
the other direction, the client may send messages to the 
codec through a ClientAction field. Table 1 gives some 
of the messages that can be sent to the codec by the 
application. 

Another use for the callback mechanism is to accom- 
modate client operations that need to be intermixed 
between video encoding/decoding operations. For 
example, the application may want to process audio 
samples while it is decompressing video. The codec can 
then be configured such that the callback function is 



Table 1 

List of Client Messages 



Message 



Interpretation 



CLIENT_ABORT 
CLIENT_CONTINUE 
CLIENT_DROP 
CLIENT.PROCESS 



Abort processing of the frame 
Continue processing the frame 
Do not decompress 
Start processing 



invoked at a (near) periodic rate, ACB_PROCESSING 
message is sent to the application by the codec at reg- 
ular intervals to give it an opportunity for rate control 
of video and/or to perform other operations. 

Typically the order in which coded pictures are pre- 
sented to the decoder does not correspond to the 
order in which they are to be displayed. Consider the 
following example: 
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Display Order II B2 B3 P4 B5 B6 P7 B8 
Decoder Input II P4 B2 B3 P7 B5 B6 110 

The order mismatch is an artifact of the compression 
algorithm — a B -picture cannot be decoded until both 
its past and future reference frames have been decoded. 
Similarly a P-picture cannot be decoded until its past 
reference frame has been decoded. To get around this 
problem, SLIB defines an output multibuffer. The size 
of this multibuffer is approximately equal to three 
times the size of a single uncompressed frame. For 
example, for a 4:2:0 subsampled GIF image, the size of 
the multibuffer would be 352 by 288 by 1 .5 by 3 bytes 
(the exact size is returned by the library during initial 
codec setup). After steady state has been reached, each 
invocation to the decompress call yields the correct 
next frame to be displayed as shown in Figure 11. To 
avoid expensive copy operations, die multibuffer is 
allocated and owned by the software above SLIB. 

ITU-T's Recommendation H.261 (a.k.a.p X 64) At the 

library level, decompressing an H.261 stream is very 
similar to MPEG-1 decoding with one exception: 
instead of three types of pictures, the H.261 recom- 
mendation defines only two, key frames and non-key 
frames (no bidirectional prediction). The implication 
for implementation is that the size of the multibuffer is 
approximately twice the size of a single decompressed 
frame. Furthermore, the order in which compressed 
frames are presented to the decompressor is the same 
as the order in which they are to be displayed. 

To satisfy the H.261 recommendation, SLIB imple- 
ments a streaming interface for compression and 
decompression. In this model, the application feeds 
input buffers to the codec, which processes the data in 
the buffers and returns the processed data to the appli- 
cation through a callback routine. During decom- 
pression, the application layer passes input buffers 
containing sections of an H.261 bit stream. The bit 
stream can be divided arbitrarily, or, in the case of live 
teleconferencing, each buffer can contain data from a 
transmission packet. Empty output buffers are also 
passed to the codec to fill with reconstructed images. 
Picture frames do not have to be aligned on buffer 



boundaries. The codec parses the bit stream and, 
when enough data is available, reconstructs an image. 
Input buffers are freed by calling the callback routine. 
When an image is reconstructed, it is placed in an out- 
put buffer and the buffer is returned to the applica- 
tion through the callback routine. The compression 
process is similar, but input buffers contain images and 
output buffers contain bit- stream data. One advantage 
to this streaming interface is that the application layer 
does not need to know the syntax of the H.261 bit 
stream. The codec is responsible for all bit-stream 
parsing. Another advantage is that the callback mecha- 
nism for returning completed images or bit-stream 
buffers allows the application to do other tasks with- 
out implementing multithreading. 

SLIB's architecture and API can easily accommo- 
date ISO's MPEG-2 and ITU-T's H.263 video com- 
pression algorithms because of their similarity to the 
MPEG-1 and H.261 algorithms. 

Implementation of Video Rendering 

Our software implementation of video rendering 
essentially parallels the hardware realization detailed 
elsewhere in this issue.' As with the hardware imple- 
mentation, the software Tenderer is fast and simple 
because the complicated computations are performed 
offline in building the various look-up tables. In both 
hardware and software cases, a shortcut is achieved by 
dithering in YUV space and then converting to some 
small number of RGB index values in a look-up table .^'^ 
Although in most cases the mapping values in the 
look-up tables remain fixed for the duration of the 
run, the video library provides routines to dynamically 
adjust image brightness, contrast, saturation, and the 
number of colors. Image scaling is possible but affects 
performance. When quality is important, the software 
performs scaling before dithering and when speed is 
the primary concern, it is done after dithering. 

Optimizations 

We approached the problem of optimization from two 
directions: Platform-independent optimizations, or 
algorithmic enhancements, were done by exploiting 
knowledge of the compression algorithm and the 
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input data stream. Platform-dependent optimizations 
were done by examining the services available from 
the underlying operating system and by evaluating the 
attributes of the system's processor. 

As can be seen from Table 2, the DCT is one of the 
most computationally intensive components in the 
compression pipeline. It is also common to all five 
international standards. Therefore, a special effort was 
made in choosing and optimizing the DCT. Since all 
five standards call for the inverse DCT (IDCT) to be 
postprocessed with inverse quantization, significant 
algorithmic savings were obtained by computing a 
scalar multiple of the DCT and merging the appropri- 
ate scaling into the quantizer. The DCT implemented 
in the library is a modified version of the one- 
dimensional scaled DCT proposed by Arari et al.*^ The 
two-dimensional DCT is obtained by performing a 
one-dimensional DCT on the columns followed by 
a one-dimensional DCT on the rows, A total of 80 
multiplies and 464 adds are needed for a fiilly popu- 
lated 8-by-8 block. In highly compressed video, the 
coefficient matrix to be transformed is generally sparse 
because a large number of elements are "zeroed" out 
due to heavy quantization. We exploit this fact to 
speed up the DCT computations. In the decoding 
process, the Huffman decoder computes and passes to 
the IDCT a list of rows and columns that are all zeros. 
The IDCT then simply skips these columns.^' Another 
optimization uses a different IDCT, depending on the 
number of nonzero coefficients. The overall speedup 
due to these techniques is dependent on the amount 
of compression. For lightly compressed video, we 
observed that the overhead due to tiiese techniques 
slowed down the decompressor. We overcame this dif- 
ficulty by building into SLIB the adaptive selection of 
the appropriate optimization based on continuous sta- 
tistics gathering. Run-time statistics of the number of 
blocks per frame that are all zeros are maintained, and 
the number of frames over which these statistics are 
evaluated is provided as a parameter for the client 
applications. Statistic gathering is minimal: a counter 
update and an occasional compare. 



The second component of the video decoders we 
looked at was the Huffman decoder. Analysis of the 
compressed data indicated tiiat short-code-length 
symbols were a large part of the compressed bit 
stream. The decoder was written to handle frequentiy 
occurring very short codes (< 4 bits) as special cases, 
thus avoiding loads fi"om memory. For short codes 
(< 8 bits), look-up tables were used to avoid bit-by-bit 
decoding. Together, these two classes of codes 
account for well over 90 percent of the total collection 
of the variable -lengtii codes. 

A third compute -intensive operation is raster-to- 
block conversion in preparation for compression. This 
operation had the potential of slowing down the com- 
pressor on Alpha-based systems on which byte and 
short accesses are done indirectiy. We implemented an 
assembly language routine that would read die 
uncompressed input color image and convert it to 
three one-dimensional arrays containing 8-by-8 
blocks in sequence. Special care was taken to keep 
memory references aligned. Relevant bytes were 
obtained through shifting and masking operations. 
Level shifting was also incorporated within the routine 
to avoid touching the same data again. 

Other enhancements included replacing multiplies 
and divides with shifts and adds, avoiding integer to 
floating-point conversions, and using floating-point 
operations wherever possible. This optimization is 
particularly suited to the Alpha architecture, where 
floating-point operations are significandy faster than 
integer operations. We also worked to reduce memory 
bandwidth. Ill-placed memory accesses can stall the 
processor and slow down the computations. Instruc- 
tions generated by the compiler were analyzed and 
sometimes rescheduled to void data hazards, to keep 
the on-chip pipeline fiill, and to avoid unnecessary 
loads and stores. Critical and small loops were unrolled 
to make better use of floating-point pipelines. 
Reordering the computations to reuse data already in 
registers and caches helped minimize thrashing in the 
cache and the translation lookaside buffer. Memory 
was accessed through offsets rather than pointer 



Table 2 

Typical Contributions of the Major Components in the Playback of Compressed Video (SIF) 
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increments. More local variables than global variables 
were used. Wherever possible, fixed values were hard 
coded instead of using variables that would need to 
be computed. References were made to be 32-bit or 
64- bit aligned accesses instead of byte or short. 

Consistent with one of the design goals, SLIB was 
made thread-safe and fully reentrant. The Digital 
UNIX, the OpenVMS, and Microsoft's Windows NT 
operating systems all offer support for multithreaded 
applications. Applications such as video playback can 
improve their performance by having separate threads 
for reading, decompressing, rendering, and display- 
ing. Also, a multithreaded application scales up well on 
a multiprocessor system. Global multithreading is 
possible if the library code is reentrant or thread-safe. 
When we were trying to multithread the library inter- 
nals, we found that the overhead caused by the birth 
and death of threads, the increase in memory accesses, 
and the fragmentation of the codec pipeline caused 
operations to slow down. For these reasons, rou- 
tines within SLIB were kept single-threaded. Other 
operating-system optimizations such as memory lock- 
ing, priority scheduling, nonpreemption, and faster 
timers that are generally good for real-time applica- 
tions were experimented with but not included in our 
present implementation. 

Performance on Digital's Alpha Machines 

Measuring the performance of video codecs is gener- 
ally a difficult problem. In addition to the usual depen- 
dencies such as system load, efficiency of the 
underlying operating system, and application over- 
head, the speed of the video codecs is dependent on 
the content of the video sequence being processed. 
Rapid movement and action scenes can delay both 
compression and decompression, while slow motion 
and high-frequency content in a video sequence can 
generally result in fester decompression. When com- 
paring the performance of one codec against another, 
the analyst must make certain that all codecs process 
the same set of video sequences under similar oper- 
ating conditions. Since no sequences have been 
accepted as standard, the analyst must decide which 
sequences are most typical. Choosing a sequence that 
favors the decompression process and presenting 
those results is not uncommon, but it can lead to false 
expectations. Sequences with similar peak signal-to- 
noise ratio (PSNR) may not be good enough, because 
more often than not PSNR (or equivalently the mean 
square error) does not accurately measure signal qual- 
ity. With these thoughts in mind, we chose some 
sequences that we thought were typical and used these 
to measure the performance of our software codecs. 
We do not present comparative results to codecs 



implemented elsewhere since we did not have access 
to these codecs and hence could not test these with the 
same sequences. 

Table 3 presents the characteristics of the three 
video sequences used in our experiments. Let Lf{x,y) 
and Li{x,y) represent the luminance component of 
the original and the reconstructed fi*ame i; let n and m 
represent the horizontal and vertical dimensions of 
a fi-ame; and let N be the number of frames in the 
video sequence. Then the Compression Ratio, the 
average output BitsPerPixel, and the average PSNR are 
calculated as 

Compression Ratio = 
bits in frame[/] of original video 

N 

2 bits in frame [j] of compressed video 
1-1 

Avg. BitsPerPixel = 

1 \^ bits in frame[/ ] of (S) 
NXnXm'Ti compressed video 

Avg. PSNR = .g^ 



Figure 12 shows the PSNR for individual frames in 
the video sequences along with the distribution of 
frame size for each of three test sequences. Frame 
dimensions within a sequence always remain constant. 

Table 4 provides specifications of the workstations 
and PCs used in our experiments for generating 
the various performance numbers. The 21064 chip 
is Digital's first commercially available Alpha proces- 
sor. It has a load-store architecture, is based on a 
0.75-micrometer complementary metal-oxide semi- 
conductor (CMOS) technology, contains 1.68 million 
transistors, has a 7- and 10-stage integer and floating- 
point pipeline, has separate 8 -kilobyte instruction and 
data caches, and is designed for dual issue. The 
2 1064A microprocessor has the same architecture as 
the 21064 but is based on a 0.5-micrometer CMOS 
technology and supports faster clock rates. 

We provide performance numbers for the video 
sequences characterized in Table 3. Figure 13 provides 
measured data on CPU usage when compressed video 
(from Table 3) is played back at 30 frames per second 
on the various test platforms shown in Table 4. We 
chose "percentage of CPU used" as a measure of per- 
formance because we wanted to know whether the 
CPU could handle any other tasks when it was doing 
video processing. Fortunately, it turned out the 



Digital Technical Journal 



Vol.7 No. 4 1995 



Table 3 

Characteristics of the Video Sequences Used to Generate the Performance Numbers Shown In Figure 1 2 



Spatial Temporal 
Compression Resolution Resolution Avg. Compression Avg, PSNR 



Name Algorithm (width X height) (No. of Frames) BitsPerPixel Ratio (dB) 

Sequence 1 M-JPEG 352 x 240 200 0.32 50:1 31.56 

Sequence 2 MPEG-1 352x 288 200 0.17 69:1 32.28 
Video 

M-JPEG 352 x 240 200 0.56 28:1 31.56 

Sequence 3 INDEO 352x 240 200 0.16 47:1 28.73 
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Table 4 



Specifications of Systems Used in Experimentation 


System Name 


CPU 


Bus 


Clock Rate 


Cache 


Memory 


Operating 
System 


Disk 


AlphaStation 
600 5/266 
workstation 
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PCI 
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2 MB 
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answer was a resounding "Yes" in the case of Alpha 
processors. The video playback rate was measured 
with software video rendering enabled. When hard- 
ware rendering is available, estimated values for video 
playback are provided. 

From Figure 13, it is clear that today's workstations 
are capable of playing SIP video at full frame rates with 



no hardware acceleration. High-quality M-JPEG and 
MPEG-1 compressed video clips can be played at full 
speed with 20 percent to 60 percent of the CPU avail- 
able for other tasks. INDEO decompression is faster 
dian M-JPEG and MPEG due to die absence of DCT 
processing. (INDEO uses a vector quantization 
method based on pixel differencing.) On three out of 
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Figure 13 

Percentage of CPU Required for Real-time Playback at 30 fps on Four Different Alpha-based Systems 
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the four machines tested, two SIF INDEO clips could 
be played back at full speed with CPU capacity left 
over for other tasks. 

The data also shows the advantage of placing the 
color conversion and rendering of the video in the 
graphics hardware (see Table 2 and Figure 13). 
Software rendering accounts for one-third of the total 
playback time. Since rendering is essentially a table 
look-up function, it is a good candidate for moving 
into hardware. If hardware video rendering is available, 
multiple M-JPEG and MPEG-1 clips can be played 
back on three of the four machines on which the soft- 
ware was tested. 

Software video compression is more time-consum- 
ing than decompression. All algorithms discussed in 
this paper are asymmetric in the amount of processing 
needed for compression and decompression. Even 
though the JPEG algorithm is theoretically symmetric, 
the performance of the JPEG decoder is better than 
that of the encoder. The difference in performance is 
due to the sparse nature of the quantized coefficient 
matrices, which is exploited by the appropriate IDCT 
optimizations. 

For video encoders, we measured the rate of com- 
pression for both SIF and quarter SIF (QSIF) formats. 
Since the overhead due to I/O affects the rate at which 
the compressor works, we present measured rates col- 
lected when the raw video sequence is read from disk 
and when it is captured in real time. The capture cards 
used in our experiments were the Sound & Motion 
J300 (for systems with the TURBOchannel bus) and 
the FuUVideo Supreme (for PCI-based systems). The 
compressed bit streams were stored as AVI files on local 
disks. The sequences used in this experiment were 
the same ones used for obtaining measurement for the 
various decompressors; their output characteristics are 



given in Table 3. Table 5 provides performance num- 
bers for the M-JPEG and an unoptimized INDEO 
compressor. For M-JPEG, rates for both monochrome 
and color video sequences are provided. 

The data in Table 5 indicates that the M-JPEG com- 
pression outperforms INDEO (although one has to 
keep in mind that INDEO was not optimized). This 
difference occurs because M-JPEG compression, 
unlike INDEO, does not rely on interframe prediction 
or motion estimation for compression. Furthermore, 
when raw video is compressed fi-om disk, the encoder 
performs better than when it is captured and com- 
pressed in real time. This can be explained on the basis 
of the overhead resulting from context switching in 
the operating system and the scheduling of sequential 
capture operation by the applications. Real-time cap- 
ture and compression of image sizes larger than QSIF 
still require hardware assistance. It should be noted 
that in Table 5, the maximum compression rate for 
real-time capture and compression does not exceed 30 
frames per second, which is the hmit of the capture 
hardware. Since there are no such limitations for disk 
reads, compression rates of greater than 30 frames per 
second for QSIF sequences are recorded. 

With die newer Alpha chip we expect to see 
improved performance. A factor we neglected in our 
calculations was prefiltering. Some capture boards are 
capable of capturing only in CCIR601 format and do 
not include decimation filters as part of their hard- 
ware. In such cases, the software has to filter each 
frame down to GIF or QCIF, which adds substantially 
to the overall compression time. For applications that 
do not require real-time compression, software 
digital-video compression may be a viable solution 
since video can be captured on fast disk arrays and 
compressed later. 
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Typical Number of Frames Compressed per Second 
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Sample Applications 

We implemented several applications to test our archi- 
tecture (codecs and renderer) and to create a test bed 
for performance measurements. These programs also 
served as sample code for software developers incorpo- 
rating SLIB into other multimedia software layers. 

The Video Odyssey Screen Saver 

The Video Odyssey screen saver uses software video 
decompression and 24-bit YCbCr to 8-bit pseudo- 
color rendering to deliver video images to the screen 
in a variety of modes. The program is controlled by 
a control panel, shown in Figure 14. 

The user can select from several methods of display- 
ing the decompressed video or let the computer cycle 
through all methods. The floaters mode, shown in 
Figure 15, floats one to four copies of the video 
around the screen with the number of floating win- 
dows controlled by a slider in the control panel. The 
snapshot mode floats one window of the video around 
the screen, but every second takes a snapshot of a 
frame and pastes it to the background behind the 
floating window. 

All settings in the control panel are saved in a con- 
figuration file in the user's home directory. The user 
selects a video file with the file button. In the current 
implementation, any AVI file containing Motion JPEG 
or raw YUV video is acceptable. The user can set the 
time interval for the screen saver to take over. Controls 
for setting brightness, contrast, and saturation are also 
provided. Video can be played back at norma! resolu- 
tion or with X2 scaling. Scaling is integrated with 
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Video Odyssey Screen Saver in Floaters Mode 



the color conversion and dithering for optimization. A 
pause feature allows the user to leave his or her screen 
in a locked state with an active screen saver. The screen 
is unlocked only if the correct password is provided. 

The Software Video Player 

The software video player is an application for viewing 
video that is similar to a VCK Like Video Odyssey, the 
software video player exercises the decompression and 
rendering portions of SLIB. Unlike Video Odyssey, 
the software video player allows random access to any 
portion of the video and permits single-step, reverse, 
and fast- forward ftinctions. Figure 16 shows the dis- 
play window of the software video player. 
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Figure 14 

Video Odyssey Control Panel 



Figure 16 

The Software Video Player Display Window 
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The user moves through the file with a scroll bar 
and a set of VCR-like buttons. The button on the far 
left of the display window allows the video to be dis- 
played at normal size or at a magnification of X2. The 
far-right button allows adjustment of brightness, con- 
trast, saturation, and number of displayed colors. The 
quality of the dithering algorithm used in rendering is 
such that values as low as 25 colors lead to acceptable 
image quality. Allowable file formats for the software 
video player are M-JPEG (AVI format and the JPEG 
file interchange format or JFIF), MPEG-1 (both video 
and system streams), and raw YUV. , 

Random access into the file is done in one of two 
ways, depending on the file format. For formats that 
contain an index of the frame positions in the file (like 
AVI files), the index is simply used to seek the desired 
frame. For formats that do not contain an index, such 
as MPEG-1 and JFIF, the software video player esti- 
mates the location of a frame based on the total length 
of the video clip and a running average of frame size. 
This technique is adequate for most video clips and has 
the advantage of avoiding the time needed to first 
build an index by scanning through the file. 

Interframe compression schemes like MPEG-1 and 
INDEO pose special problems when trying to access 
a random frame in a video clip. MPEG-l's B- and 
P-frames are dependent on preceding frames and can- 
not be decompressed alone. One technique for han- 
dling random access into files with non-key frames 
and no frame index is to use the file position specified 
by the user (with a scroll bar or by other means) as a 
starting point and then to search the bit stream for the 
next key firame (an I-frame in MPEG- 1 ) . At that point, 
display can proceed normally. Reverse play is also a 
problem with these formats. The software video player 
deals with reverse by displaying only the key frames. 
It could display all frames in reverse by predecom- 
pressing all frames in a group and then displaying them 
in reverse order, but this would require large amounts 
of memory and would pose problems with processing 
delays. Rate control fiinctions, including fast-forward 
and fast-reverse fiinctions, can be done by selectively 
throwing out non-key frames and processing key or 
I- frames only. 

Other Applications 

Several other applications using different components 
of SLIB were also written. Some of these are 
(1) Encode — a video encoding application that uses 
SLIB's compression component to compress raw 
video to M-JPEG format, (2) Rendit — a viewer for 
true color images that uses SLIB's rendering compo- 
nent to scale, tone-adjust, dither, quantize, color space 
convert, and display 24-bit RGB or 16-bit YUV 
images on frame buffers with limited planes, and 
(3) routines for viewing compressed on-line video 



documentation that was incorporated into Digital's 
videoconferencing product. 

Related Work 

While considerable effort has been devoted to opti- 
mizing video decoders, little has been done for video 
encoders. Encoding is generally computationally more 
complex and time-consuming than decoding. As a 
result, obtaining real-time performance from encoders 
has not been feasible. Another rationalization for 
interest in decoders has been that many applications 
require video playback and only a few are based on 
video encoding. As a result, "code once, play many 
times" has been the dominant philosophy. In most 
papers, researchers have focused on techniques for 
optimizing the various codecs; very little has been 
published on providing a uniform architecture and an 
intuitive API for the video codecs. 

In this section, we present results fi"om other papers 
published on software video codecs. Of the three 
international standards, MPEG-1 has attracted the 
most attention, and our presentation is biased slighdy 
toward this standard. We concentrate on work that 
implements at least one of the three recognized inter- 
national standards. 

The JPEG software was made popular by the 
Independent Software JPEG Group formed by Tom 
Lane.^° He and his colleagues implemented and made 
available firee software that could perform baseline JPEG 
compression and decompression. Considerable atten- 
tion was given to software modularity and portability. 
The main objective of this codec was still-image com- 
pression although its modified version has been used for 
decompression of motion JPEG sequences as well. 

The MPEG software video decoder was made popu- 
lar by the multimedia research group at the University 
of California, Berkeley. The availability of this free soft- 
ware sparked the interest of many who now had the 
opportunity to play with and experiment with com- 
pressed video. Patel et al. describe the implementation 
of this software MPEG decoder.^* The focus in their 
paper is on an MPEG-1 video player that would 
be portable and fast. The authors describe various 
optimizations, including in-line procedures, custom 
coding fi*equent bit-twiddling operations, and render- 
ing in the YUV space with color conversion through 
look-up tables. They observed that the key botdeneck 
toward real-time performance was not the compu- 
tation involved but the memory bandwidth. They 
also concluded that data structure organization and 
bit-level manipulations were critical for good perfor- 
mance. The authors propose a novel metric for com- 
paring the performance of the decoder on systems 
marketed by different systems vendors. Their metric, 
the percentage of required bit rate per second per 
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thousand dollars (PBSD), takes into account the price 
of the system on which the decoder is being evaluated. 

Bheda and Srinivasan describe the implementa- 
tion of an MPEG-1 decoder that is portable across 
platforms because the software is written entirely in 
a high-level language.^^ The paper describes the vari- 
ous optimizations done to improve the decoder's 
speed and provides performance numbers in terms of 
number of frames displayed per second. The authors 
compare the speed of their decoder on various 
platforms, including Digital's first Alpha-based PC run- 
ning Microsoft's Windows NT system. They conclude 
that their decoder performed best on the Alpha system. 
It was able to decompress, dither, and display a 320- 
pixel by 240-line video sequence at a rate of 12.5 fi'ames 
per second. A very brief description of the API sup- 
ported by the decoder is also provided. The API is able 
to support operations such as random access, fast for- 
ward, and fast reverse. Optional skipping of B-frames is 
possible for rate control. The authors conclude that the 
size of the cache and the performance of the display sub- 
system are critical for real-time performance. 

Bhaskaran and Konstantinides describe a real- 
time MPEG-1 software decoder that can play both 
audio and video data on a Hewlett-Packard PA- RISC 
processor- based workstation.^^ The paper provides 
step-by-step details on how optimization was carried 
out at both the algorithmic and the architectural 
levels. The basic processor was enhanced by including 
in the instruction set several multimedia instructions 
capable of performing parallel arithmetic operations 
that are critical in video codecs. The display subsystem 
is able to handle color conversion of YCbCr data and 
up-sampling of image data. The performance of the 
decoder is compared to software decoders running on 
different platforms from different manufacturers. The 
comparison is not truly fair because the authors com- 
pare their decoder, which has hardware assistance 
available to it (i.e., an enhanced graphic subsystem and 
new processor instructions), to other decoders that are 
truly software based. Furthermore, since all the codecs 
were not running on the same machine under similar 
operating conditions and since the sequence tested on 
their decoder is not the same as the one used by the 
others, the comparison is not truly accurate. The paper 
does not provide any information on the program- 
ming interface, the control flow, and the overall soft- 
ware architecture. 

There are numerous other descriptions of the 
MPEG-1 software codecs. Eckart describes a software 
MPEG video player that is capable of decoding both 
audio and video in real time on a PC with a 90- mega- 
hertz Pentium processor.^* Software for this decoder is 
available freely over the Internet. Gong and Rowe 
describe a parallel implementation of the MPEG-1 



encoder that runs on a network of workstations.^^ The 
performance improvements of greater than 650 
percent are reported when the encoding process is 
performed on 9 networked HP 9000/720 systems 
as compared to a single system. 

Wu et al. describe the implementation and per- 
formance of a software-only H.261 video codec on 
the PowerPC 601 reduced instruction set computer 
(RISC) processor.^** This paper is interesting in that it 
deals with optimizing both the encoder and the 
decoder to facilitate real-time, full-duplex network 
connections. The codec plugs under the QuickTime 
architecture developed by Apple Computer, Inc. and 
can be invoked by applications that have programmed 
to the QuickTime interface. The highest display rate is 
slightly under 18 frames per second for a QSIF video 
sequence coded at 64 kilobits per second with disk 
access. With real-time video capture included, the 
frame rate reduces to between 5 and 10 fi-ames per 
second. The paper provides an interesting insight by 
giving a breakdown of the amount of time spent in 
each stage of coding and decoding on a complex 
instruction set computer (CISC) versus a RISC system. 
Although the paper does a good job of describing the 
optimizations, very littie is mentioned about the soft- 
ware architecture, the programming interface, and the 
control flow. 

We end this section by recommending some sources 
for obtaining additional information on the state 
of the art in software-only video in particular and in 
multimedia in general. First, the Society of Photo- 
Optical Instrumentation Engineers (SPIE) and the 
Association of Computing Machinery (ACM) sponsor 
annual multimedia conferences. The proceedings from 
these conferences provide a comprehensive record of 
the advances made on a year-to-year basis. In addition, 
both the Institute of Electrical and Electronics 
Engineers (IEEE) and ACM regularly publish issues 
devoted to multimedia. These special issues contain 
review papers with sufficient technical details.^*^^ 
Finally, an excellent book on the subject of video com- 
pression is the recendy published D/^to/P/c^wres (sec- 
ond edition) by Arun Netravali and Barry Haskel fi*om 
Plenum Press. 

Conclusions 

We have shown how popular video compression 
schemes are composed of an interconnection of dis- 
tinct functional blocks put together to meet specified 
design objectives. The objectives are almost always set 
by the target applications. We have demonstrated that 
the video rendering subsystem is an important compo- 
nent of a complete playback solution and presented 
a novel algorithm for mapping out-of-range colors. 
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We described the design of our software architecture 
for video compression, decompression, and playback. 
This architecture has been successfully implemented 
over multiple platforms, including the Digital UNIX, 
the OpenVMS, and Microsoft's Windows NT operat- 
ing systems. Performance results corroborate our 
claim that current processors can adequately handle 
playback of compressed video in real time with littie or 
no hardware assistance. Video compression, on the 
other hand, still requires some hardware assistance for 
real-time performance. We believe the widespread use 
of video on the desktop is possible if high-quality 
video can be delivered economically. By providing 
software-only video playback, we have taken a step in 
this direction. 
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